Beyond Correlation: How Causal Concept Graphs Are Teaching AI to Explain What Would Have Happened Otherwise

Most explanations for AI decisions tell you what happened but not what would have happened otherwise. A saliency map highlights which pixels mattered. A concept attribution scores which features contributed. Neither answers the question a decision-maker actually needs answered: if this concept had been different, would the outcome have changed? That question — the interventionist question — requires causal reasoning, and the emerging field of causal concept graph models is building the machinery to answer it.

The Problem of Causal Opacity

Deep neural networks suffer from what Dominici, Barbiero, Espinosa Zarlenga et al. (2024) call causal opacity: even when we can identify which concepts a model uses, we cannot determine the causal structure connecting those concepts to the output. Standard concept-based explanations treat concepts as independent contributors, ignoring that concepts often cause each other. A patient's smoking history affects their cholesterol level, which affects cardiac risk — these are not independent features but links in a causal chain. An explanation that scores each concept in isolation misrepresents the actual decision structure.

Their proposed framework, Causal Concept Graph Models, addresses this by embedding an explicit causal graph into the model architecture. Concepts are nodes. Directed edges represent causal relationships between them. The model learns both the concept values and the causal graph structure, producing predictions that flow through a transparent causal pathway. The result is a system where one can ask genuine counterfactual questions: what would the model predict if this patient had not smoked but all other factors remained as they are?

The practical implications are significant. The authors demonstrate that Causal CGMs achieve performance comparable to standard neural networks while enabling three capabilities that conventional models cannot provide: human correction of intermediate reasoning steps, formal interventional analysis, and verification of fairness properties through causal reasoning about protected attributes.

Distilling Causal Structure from Black Boxes

A different approach emerges from Moreira, Bono, Cardoso et al. (2024) at Feedzai, presented at the Conference on Causal Learning and Reasoning. Their method, DiConStruct, does not require building a causal model from scratch. Instead, it works as a distillation model — a transparent surrogate that approximates the predictions of any existing black-box classifier while simultaneously learning the causal relationships between concepts.

The architecture produces explanations in two complementary forms. A structural causal model captures how concepts relate to each other and to the output — for instance, that smoking causally influences cholesterol, which causally influences the prediction. Concept attributions quantify each concept's contribution, but now within the context of the learned causal structure rather than in isolation. The key design choice is that the explainer does not modify or constrain the original black-box model. The prediction system remains unchanged; DiConStruct runs alongside it, approximating its behavior with a causally structured explanation.

On both image and tabular datasets, DiConStruct approximates black-box models with higher fidelity than other concept explainability baselines. More importantly, its explanations are structurally richer: rather than a flat list of concept importances, each explanation is a graph showing how concepts interact on the path to the prediction.

From Sufficiency to Intervention

Bjøru, Lysnæs-Larsen, Jørgensen et al. (2025), publishing in Frontiers in Artificial Intelligence, develop the theoretical backbone that both approaches above require. Their framework formalizes causal concept-based explanations through the probability of sufficiency — a precise causal quantity that measures whether a concept intervention is sufficient to change the model's output.

The distinction from standard feature importance is fundamental. Feature importance answers: how much did this concept contribute? Probability of sufficiency answers: if I set this concept to a different value, would the prediction change? The second question is inherently causal. It requires reasoning about interventions, not merely observing correlations between concept values and outputs.

Their framework generates both local explanations (why this particular prediction was made) and global explanations (what concepts generally drive this model's decisions) through the same causal mechanism. Crucially, the authors stress that the validity of these explanations depends on alignment between the context of generation and the context of interpretation — a causal explanation is only faithful if the assumed causal structure matches the model's actual decision process.

Why This Matters Beyond Explanation

The convergence of these three lines of work — architectural causal models, distillation-based causal explainers, and formal causal explanation theory — points toward a substantive shift in what we mean by AI interpretability. The current dominant paradigm treats explanation as a post-hoc annotation: train the model, then explain it. Causal concept approaches treat explanation as a structural property: the model's architecture encodes the causal reasoning that produces the prediction.

This distinction has regulatory implications. The EU AI Act requires that high-risk AI systems provide explanations that enable "meaningful human oversight." A saliency map showing which pixels were important does not obviously enable oversight — it does not tell a human what to check or what to change. A causal concept graph, by contrast, shows the decision pathway in terms the overseer can evaluate, intervene on, and correct. It answers not only "what did the model see?" but "what would happen if this factor were different?"

The open challenges remain considerable. Causal graphs require assumptions about the correct causal structure, and these assumptions may be wrong. The concept vocabulary must be defined in advance. And scaling causal reasoning to models with hundreds of concepts and complex interactions is computationally demanding. But the direction is clear: the field is moving from asking what a model's decision correlates with to asking what causally produced it — and that shift changes the kind of trust we can place in AI systems.

Beyond Correlation: How Causal Concept Graphs Are Teaching AI to Explain What Would Have Happened Otherwise

The Problem of Causal Opacity

Distilling Causal Structure from Black Boxes

From Sufficiency to Intervention

Why This Matters Beyond Explanation

References (3)

Explore this topic deeper