Paper ReviewMathematics & StatisticsCausal Inference

Can LLMs Discover Causes? When Language Models Meet Observational Causal Inference

Traditional causal discovery requires large datasets and strong statistical assumptions. LLMs bring a new ingredient: domain knowledge encoded in pre-training. Susanti & Fรคrber test whether LLMs can use observational data for causal discovery, while REX integrates explainable AI with causal structure learning.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Causal discoveryโ€”inferring cause-and-effect relationships from dataโ€”has traditionally been the domain of specialized statistical algorithms: PC, GES, LiNGAM, and their many variants. These methods work directly on observational data, using conditional independence tests or structural equation models to infer causal graphs. They are mathematically principled but make strong assumptions (faithfulness, causal sufficiency, specific distributional forms) and require substantial data to achieve reliable results.

Large language models introduce something these statistical methods lack: domain knowledge. An LLM trained on scientific literature has absorbed extensive knowledge about causal relationships in specific domainsโ€”it "knows" that smoking causes cancer, that interest rates affect inflation, that gene mutations drive drug resistance. Can this knowledge be combined with observational data to improve causal discovery beyond what either statistical methods or LLM knowledge alone can achieve?

Susanti & Fรคrber investigate this question directly, testing whether LLMs can leverage observational data for causal discovery. REX (Renero et al.) approaches from the complementary direction, using explainable AI techniques to enhance traditional causal discoveryโ€”creating a bridge between the interpretability of LLM reasoning and the rigor of statistical causal inference.

LLMs as Causal Reasoners

Susanti & Fรคrber design a systematic evaluation of LLMs' causal discovery capability under three conditions:

Knowledge-only: The LLM is given variable names and asked to infer causal relationships based purely on its pre-trained knowledge. No data is provided. This tests the LLM's domain knowledge.

Data-only: The LLM is given observational data (correlation matrices, summary statistics, or raw data samples) without meaningful variable names. This tests the LLM's ability to perform statistical causal inference from data.

Knowledge + Data: The LLM receives both meaningful variable names and observational data. This tests the synergy between domain knowledge and statistical evidence.

The findings reveal a consistent pattern:

  • LLMs perform surprisingly well in the knowledge-only condition for well-studied domainsโ€”accurately identifying causal relationships that appear frequently in scientific literature
  • LLMs perform poorly in the data-only conditionโ€”they are not effective statistical causal inference engines
  • The knowledge + data condition shows modest improvement over knowledge-onlyโ€”suggesting that LLMs struggle to effectively integrate statistical evidence with domain knowledge
This finding is both encouraging (LLMs encode useful causal knowledge) and sobering (they cannot replace statistical causal methods for data-driven inference). The practical implication: LLMs are useful for generating prior knowledge about causal structures, which can then be integrated with statistical methods as informative priors or structural constraints.

REX: Explainability Meets Causality

Renero et al.'s REX takes a different approach. Rather than asking LLMs to perform causal discovery, REX uses explainable AI (XAI) techniquesโ€”SHAP values, feature importance, partial dependenceโ€”to extract causal information from trained machine learning models.

The insight: a well-trained predictive model implicitly captures causal information in its learned structure. If a model accurately predicts Y from Xโ‚, Xโ‚‚, ..., Xโ‚š, the model's feature importances and interaction patterns reflect (approximately) the causal influences of each Xแตข on Y. XAI techniques make these implicit causal patterns explicitโ€”extractable as a causal graph.

REX integrates multiple XAI methods to produce robust causal estimates:

  • SHAP values identify which features influence predictions (candidate causes)
  • Partial dependence plots reveal the direction of influence (positive/negative)
  • Feature interaction effects identify causal mediation and moderation
The combination of multiple XAI signals, aggregated through a consensus mechanism, produces causal graphs that are more robust than any single XAI method alone.

Claims and Evidence

<
ClaimEvidenceVerdict
LLMs encode useful causal domain knowledgeSusanti & Fรคrber: good performance on knowledge-only conditionโœ… Supported
LLMs can perform statistical causal inference from dataPoor performance on data-only conditionโŒ Not supported
XAI techniques extract causal information from ML modelsREX demonstrates on standard causal benchmarksโœ… Supported
LLM + data integration improves over LLM knowledge aloneModest improvement documentedโš ๏ธ Limited improvement
These methods match dedicated causal discovery algorithmsPerformance gaps remain on challenging benchmarksโš ๏ธ Complementary, not replacement

Open Questions

  • Domain specificity: LLMs' causal knowledge reflects their training data. For novel or under-studied causal relationships (new drug interactions, emerging economic mechanisms), LLM knowledge may be absent or incorrect. How do we identify the boundaries of LLM causal knowledge?
  • Hallucinated causation: LLMs may assert causal relationships that are plausible but incorrectโ€”confusing correlation patterns in their training data with genuine causation. How do we distinguish genuine causal knowledge from hallucinated causation?
  • Integration framework: What is the optimal way to combine LLM causal priors with statistical causal methods? Bayesian frameworks that use LLM outputs as informative priors are promising but require careful calibration of prior strength.
  • Causal XAI validity: Under what conditions do XAI-derived causal estimates match true causal effects? The relationship between predictive feature importance and causal influence is complex and not always positive.
  • What This Means for Your Research

    For causal inference researchers, LLMs and XAI provide complementary information sources for causal discovery. LLMs contribute domain knowledge; XAI contributes data-driven pattern extraction. Neither replaces dedicated causal methods, but both can augment themโ€”particularly in the common scenario where domain knowledge is available but incomplete.

    For ML practitioners who use predictive models, REX demonstrates that your trained models contain causal information that can be extracted. This is valuable even when the primary goal was prediction: understanding why the model makes its predictions is both scientifically informative and practically useful for model debugging.

    References (2)

    [1] Susanti, Y. & Fรคrber, M. (2025). Can LLMs Leverage Observational Data? Towards Data-Driven Causal Discovery with LLMs. arXiv:2504.10936.
    [2] Renero, J., Ochoa, I., Maestre, R. (2025). REX: Causal Discovery based on Machine Learning and Explainability techniques. Pattern Recognition.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords โ†’