XAI@NTNU
Why XAI?
Explainable AI (XAI) is a research field focused on providing AI systems with the capability to explain their predictions in the context of their applications. The overarching goal is to interpret or explain machine learning models so that their predictions and/or internal mechanisms become understandable to humans.
The field of XAI has experienced exponential growth (in number of publications) since 2017. The requirement that high-risk AI systems must be explainable, as demanded by the EU AI Act, has brought widespread attention to the field from multiple sectors.
Explanations of the predictions and functions of AI systems can serve multiple needs:
- Regulatory requirements. For example, in addition to the requirements of the AI Act, the General Data Protection Regulation (GDPR) requires that automated decisions based on personal data are explained to the end user.
- Model evaluation. For example, domain experts must be able to assess whether a trained model has internalized domain expertise. Protection against discrimination can only be ensured if we can evaluate what the model’s predictions rely on.
- Understanding of the model. For example, developers of machine learning models must understand the implicit assumptions and representations the model develops, in order to evaluate its robustness in practice.
- Human autonomy. For example, the end user must understand what an AI system bases its decision on and how the decision can be changed, in order to protect their own interests.
XAI METHODS
There exists a variety of explanation methods for different model architectures and purposes. To select the right XAI method, the need for an explanation, model architecture, and data format must be specified. Examples of explanation types are:
Concept-based explanations - examines whether a model has internalised and made use of abstract concepts understandable to humans. Example: “Whether or not the animal has stripes determines if the model classifies the image as a zebra”.
Feature importance attribution - ranks the importance of the data properties for a model prediction, either for a single datapoint or for the model as a whole. Example: “You did not receive a loan primarily because your income is too low, and partly because you have a record of default payment”.
Counterfactual explanations – provide information on how a model’s prediction changes if the condition behind the input to the model changes. Example: “You qualify for a loan if you reduce the desired amount by n NOK and simultaneously increase your income by m NOK”.
Despite a significant availability of explanation methods, the goal of providing explanations for AI systems remains unsolved. Challenges include deciding when an explanation is satisfactory to give a representative understanding of the model’s internal mechanisms. Additionally, explanations can be inconsistent and, in some cases, wrong. This challenges the reliability of the explanation, which in turn can lead to a faulty understanding of the AI system. Standardised frameworks to evaluate the validity of explanations and benchmarks for testing and evaluating explanations are among the main focus areas within XAI.


