Center for Neurophilosophy and Ethics of Neurosciences

Breadcrumb Navigation


Counterfactual Explanations in XAI Inspired by Human Information Processing

PhD Project of Timo Freiesleben

In addition to being good decision-makers in complex situations, humans can usually also explain why they preferred this over that action. Humans can even provide helpful explanations if they settled decisions unconsciously and are not aware of their actual decision-making process. Recently, artificial agents became a lot better in making good decisions in complex situations. This is also the reason why they are more and more involved in socially relevant and ethically/legally entangled decisions like deciding on loan applications, evaluating delinquency, or controlling a car. The problem is that algorithms usually cannot give an easy explanation of why they did something. Especially since most of the current AI algorithms are highly intransparent and the decision-making process is hard to interpret. The core idea of the project is to transfer the concept of explanation we have in human-to-human interaction to human-to-machine-interaction. This transfer should incorporate the philosophical specification of the concept itself, the psychological findings of what explanations aid understanding, and the neuroscientific findings of how explanations are processed.

Psychologists found that the main type of explanations humans give in their daily life are contrastive explanations. These are explanations of the type "I have picked option1 over option2 because....". Contrastive explanations can usually be phrased as counterfactuals and hence in the form "I would have taken option2 over option1 if option2 was...". Interestingly, just as humans provide explanations post-hoc after an unconscious decision-making process, researchers try to implement this feature to machines. This very recent area of research is called a model-agnostic explanation algorithm and is a subfield of the area explainable artificial intelligence (XAI). These algorithms are of particular interest since they work independently of the algorithm that implements the decision making the process. Hence, they work for all the most current data science techniques like (deep) convolutional neural networks (CNN), recurrent neural networks (RNN), or random forests. Counterfactual explanations are exactly the type of explanation we can generate by this approach.

To produce reasonable and reliable counterfactuals and therefore helpful explanations fundamental philosophical, psychological, neuroscientific, and computational questions have to be addressed. What is the relation between a counterfactual explanation and a causal explanation? What are the limits of counterfactual explanations? How should we handle the Rashomon effect, should we give one or more explanations? How can we specify the formal algorithm that is currently used to generate counterfactuals in a way that also knowledge about what kind of explanations humans find helpful is incorporated? Can the human-way of generating explanations post-hoc in the brain inspire a computer-algorithmic approach? The project aims to settle these questions and to thereby make algorithmic decisions more transparent to human agents.