NorGPT Language Models

NorGPT Language Models

We use language every day to learn, reason, and communicate. Language Models capture the underlying structure of language.

Goals

The Norwegian language is present in many documents, including books, news articles, blogs, comments, and transcribed speeches. At NorwAI, we work on creating Large Language Models (LLMs). These models allow users to work on language-related tasks more effectively. Instead of learning an artificial syntax, they can formulate their queries in natural language. The system can parse the questions and generate answers.

Methodology

We collect publicly available data in Norwegian and, to a lesser degree, some related languages. Subsequently, we train a deep neural network to reconstruct hidden parts of the texts. Iteratively, the model reduces the reconstruction error and models the language. Having completed the training, the model can generate text from input, one token at a time.

Data

We rely mainly on public data sources, including news articles, websites, and social media. In addition, our partners have provided some proprietary data. We complement our system with a website that allows users to explore the training data.

Models

We focus on generative AI. These types of models include GPT and LLAMA. We prepare different model sizes so users can deploy the smaller models locally while, the larger, more powerful models can be available on cloud servers. Besides, we fine-tune models for specific use cases. A list of models will be added as soon as the testing has finished.

Contact

For questions, please contact Benjamin Kille