Norway Shouldn’t Just Test AI Models – We Need to Build Our Own

Norway Shouldn’t Just Test AI Models – We Need to Build Our Own

Norway Shouldn’t Just Test AI Models – We Need to Build Our Own

Tellef Raabe and Aksel Sterri are right that Norway must test the AI models we are becoming dependent on. But they overlook the most important point: The future AI will be specialized for Norwegian sectors and tasks. It is not enough to be a passive testing authority – we must also be involved in developing these models.

Op-ed by Sven Størmer Thaulow, Chair of NorwAI Executive Board & Jon Atle Gulla, Center Director and Professor, NorwAI/NTNU. This text is a translation from a Norwegian text, originally published in DN. 

Sven Størmer Thaulow and Jon Atle Gulla

In an opinion piece in DN, Tellef Raabe and Aksel Sterri from the Norwegian Computing Center argue for a national AI safety agency that would test and evaluate the large language models on which Norway is increasingly reliant. Their diagnosis is correct: Norway cannot passively import technology that is reshaping the public sector, the healthcare system, and industry without having an understanding of what we are adopting. But their prescription is incomplete.

Raabe and Sterri may be envisioning a future in which Norway primarily relates to a few large, generic language models developed by foreign technology companies. That image is already becoming outdated. The current development is pointing in another direction as well: toward domain‑ and task‑specific models tailored to concrete needs in specific sectors.

The Generic Model Is Not the End Point

ChatGPT and similar models are impressive all rounders. But the all rounder has basic limitations when tasked with problems requiring deep expertise, linguistic precision in Norwegian, or understanding of specific Norwegian regulations and practices. A generic model does not know enough about Norwegian medical record language, industry‑specific standards, or how the NAV regulatory framework handles edge cases.

This is where the most exciting developments are happening internationally. In the United States, Europe, and Asia, specialized models are being built for medicine, law, finance, and engineering.  Models are trained on domain‑specific data and evaluated against professional standards. They are often smaller, cheaper to operate, and significantly better than generic models within their fields. This makes the testing question far more complex than Raabe and Sterri suggest. For who should test a model designed to understand Norwegian patient records? A general safety authority cannot do that alone. It requires subject‑matter expertise, sector‑specific data, and academic depth.

A Healthcare Model for Norway

At NorwAI, Norway’s research center for AI innovation at NTNU, we are working with Helse Midt‑Norge to develop healthcare‑specific language models. These are models trained on Norwegian clinical texts, aligned with Norwegian medical terminology, and designed to function within the framework of Norwegian healthcare legislation and clinical practice.

This work illustrates something essential: these types of models cannot be created by OpenAI or Google. They require access to Norwegian health data, understanding of Norwegian clinical practice, and collaboration between researchers, clinicians, and technologists—something that can only emerge when academia and sector stakeholders work closely together. A generic American model can never replace this, no matter how many billions of dollars are invested in it.

The safety argument strengthens this point. Raabe and Sterri rightly emphasize the risks posed by AI models in critical infrastructure. But paradoxically, domain‑specific models offer better safety than generic ones. Their training data is known and controllable. The models are smaller and more transparent. The professional communities can evaluate performance against established clinical, legal, or technical standards. And the models operate within a Norwegian regulatory framework from the start.

When Helse Midt‑Norge and NorwAI develop a healthcare model together, safety is built in from the start - it is not added afterward by an external testing body. Safety requirements and testing datasets are defined as part of the development process of building models and in collaboration with experts from the healthcare sector, because these are the ones who can truly determine what it means for a model to be safe for its domain and tasks.

The same applies for other areas: the energy sector, maritime industries, fisheries and aquaculture, public administration, they all have their own language, data, and quality and safety requirements. The future AI landscape will consist of an ecosystem of specialized models, where the generic models are just one component among many.

Testing Requires Builders

Raabe and Sterri call for Norwegian testing capacity. We agree. But testing capacity without development capability is like a food critic who has never been in a kitchen. To truly understand what a language model does, how it fails and why, you must have built and trained models yourself. You need to understand the training data, the architecture choices, and the trade‑offs involved.

Norwegian academia is the natural arena for this work. The universities have the breadth of expertise, sector‑independent position, and long‑term perspective needed. Through research centers like NorwAI, we have already built environments where computer scientists collaborate with economists, lawyers, medical researchers, and engineers to develop AI solutions that actually work in a Norwegian context.

But this requires resources. Today, there is a striking mismatch between the ambitions of the national AI strategy and investments in the research environments expected to realize them. Norway is investing heavily in using AI, but far less in building the expertise needed to understand, adapt, and evaluate the technology we adopt.

From Dependency to Mastery

The larger question is what kind of AI nation Norway wants to be. Should we settle for being a testing authority that evaluates what American and Chinese companies deliver? Or should we be a country that also develops its own models, tailored to our needs, our language, and our values?

We believe the answer is obvious. Norway has the prerequisites: strong research communities, high digital maturity in both public and private sectors, excellent registry data, and a society built on trust. But we must use them. That means investing in academic AI research closely tied to the real needs of the sectors. It means giving research centers the resources to build and test specialized models together with industry and the public sector. And it means thinking beyond generic models and generic testing.

Raabe and Sterri have started an important conversation. Let us continue it with a broadened perspective: Norway doesn’t just need to test the AI models we are becoming dependent on. We must ensure we have the competence and capacity to build the models Norwegian sectors actually need. That is the real path from dependency to mastery.

2026-02-26