Master's Theses

Previous XAI projects 


XAI for misalignment detection 

This project involves training a reinforcement learning agent with misaligned behaviour (often referred to as goal misalignment or reward hacking), and investigating whether and which XAI methods can be used to detect the inner misalignment without explicit testing. The most time-consuming part of this project is usually finding an environment and training an RL agent that is properly misaligned.

Relevant reading: Master’s theses by Jacob Lysnæs-Larsen, Max Gunhamn

XAI for explaining sequential decision making tasks

This project involves training a reinforcement learning agent in an environment that involves sequential decision making. The agent can either be trained to perform individual predictions (like in chess, go or atari games) or for decision making on sequential data (like a traversed path). It should be investigated whether the agent shows emergent planning behaviour by investigating the explanations of its predictions in sequence.

Relevant reading: Master’s theses by Jakob Kessler and Ferdinand Eide, Erik Sommer

XAI for backdoor detection

Several techniques are available for planting backdoors in neural network models. This project examines whether such backdoors can be detected using direct inspection or XAI methods.Note that the project does not consider data poisoning based techniques, but techniques for building backdoors into the model architecture.

Relevant reading: Survey on Backdoor Attacks on Deep Learning

XAI for detecting structure in sequential decision makers

This project aims to identify whether structures in neural networks resemble structures we know exist in biological neural networks. A reinforcement learning agent is trained to locate a randomly placed goal in an enclosed environment. We then investigate whether the agent is learning abstract representations of the environment (i.e. a cognitive map) using concept-based explanation methods. This project is only available in collaboration with a neuroscience team.

Relevant reading: Master’s theses by Gro Elisabeth Sørum Oleivsgard and Henrik Haug Larsen

 

Previous XAI Master's theses


2025 

2024

2023