XAI Master' theses
Previous XAI projects
XAI for misalignment detection
This project involves training a reinforcement learning agent with misaligned behaviour (often referred to as goal misalignment or reward hacking), and investigating whether and which XAI methods can be used to detect the inner misalignment without explicit testing. The most time-consuming part of this project is usually finding an environment and training an RL agent that is properly misaligned.
Relevant reading: Master’s theses by Jacob Lysnæs-Larsen, Max Gunhamn
XAI for explaining sequential decision making tasks
This project involves training a reinforcement learning agent in an environment that involves sequential decision making. The agent can either be trained to perform individual predictions (like in chess, go or atari games) or for decision making on sequential data (like a traversed path). It should be investigated whether the agent shows emergent planning behaviour by investigating the explanations of its predictions in sequence.
Relevant reading: Master’s theses by Jakob Kessler and Ferdinand Eide, Erik Sommer
XAI for backdoor detection
Several techniques are available for planting backdoors in neural network models. This project examines whether such backdoors can be detected using direct inspection or XAI methods.Note that the project does not consider data poisoning based techniques, but techniques for building backdoors into the model architecture.
Relevant reading: Survey on Backdoor Attacks on Deep Learning
XAI for detecting structure in sequential decision makers
This project aims to identify whether structures in neural networks resemble structures we know exist in biological neural networks. A reinforcement learning agent is trained to locate a randomly placed goal in an enclosed environment. We then investigate whether the agent is learning abstract representations of the environment (i.e. a cognitive map) using concept-based explanation methods. This project is only available in collaboration with a neuroscience team.
Relevant reading: Master’s theses by Gro Elisabeth Sørum Oleivsgard and Henrik Haug Larsen
Previous XAI Master's theses
2025
- Gro Elisabeth Sørum Oleivsgard; Henrik Haug Larsen: Where am I? Using Explainable AI to Locate Place Cells in Deep Reinforcement Learning
- Jakob Kessler; Ferdinand Eide: Going home: Investigating Path Integrating Capabilities of Transformers with XAI
- Eirik Reiestad: Better Together? Explaining Emergent Behavior in Multi-Agent Reinforcement Learning
- Max Gunhamn: Evaluating XAI methods' contributions when predicting DQN intentions
2024
- Jacob Lysnæs-Larsen: Explainable AI through Concept Detection with Application to Misalignment Detection and Mitigation
- Erik Storås Sommer: From Static to Dynamic Concept Detection in Sequential Decision-Making: Improving Reward Functions Using Explainable AI
- Marte Eggen: Explainable AI Approaches for Large Generative Transformer-Based Language Models
- Eivind Berger-Nilsen: Compress-and-Conquer Adaptations for Scalable Explainable AI
- Henriette Viola Christine Ameln; Lea Haug Sandberg: Explaining a deep learning model for cerebral palsy prediction
2023
- Patrik Hammersborg: Explainable AI approaches for deep reinforcement learning agents in a high performance chess environment
- Eivind Kohmann: Investigating the Capability of Generative Adversarial Networks of Capturing Implicit Laws in Physical Systems
- William Dalheim: Generative AI through Latent Modeling: The Theoretical Foundations of Diffusion Models