Education

Knowledge Graphs Meet Causal Inference in Education: Beyond Correlation-Driven Learning

Most AI education systems recommend what to learn next based on correlation. A new wave of research integrates knowledge graphs with causal inference to answer the harder question: why does this learning pathway work? The shift from prediction to explanation may transform how we design curricula.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Here is a question that should trouble every researcher in educational technology: when an adaptive learning platform recommends that a student study "linear equations" before "quadratic functions," does it know why this sequence works, or has it merely observed that students who followed this order scored higher on the final exam?

The distinction is not academic. A correlation-based system that recommends "linear equations first" because this sequence correlates with higher scores will fail catastrophically when deployed in a context where the correlation breaks—a different curriculum structure, a different student population, a different assessment instrument. A causal system that understands why linear equations causally enable quadratic reasoning will generalize, because the causal mechanism transfers even when the surface statistics do not.

This is the promise of integrating knowledge graphs with causal inference in education: moving from systems that predict learning outcomes to systems that explain learning mechanisms. And the early results, while preliminary, suggest that this shift may be as consequential for educational AI as the transition from behaviorist to cognitive models was for instructional design.

The Landscape: Two Technologies, One Convergence

Knowledge graphs in education represent curricula as structured networks of concepts (nodes) and prerequisite relationships (edges). Unlike flat taxonomies or linear syllabi, they capture the rich, non-linear dependency structure of knowledge: understanding "probability" requires not just "statistics" but also "set theory," "combinatorics," and (often overlooked) "proportional reasoning." Educational knowledge graphs have been built for mathematics (KnowEdu), computer science (CS Knowledge Graph), and medical education (MedEdKG), typically through expert annotation or automated extraction from textbooks and learning management system logs.

Causal inference is the statistical machinery for distinguishing correlation from causation in observational data. In education, the gold standard for causal claims is the randomized controlled trial (RCT)—randomly assign students to different learning pathways and compare outcomes. But RCTs in education are expensive, ethically constrained (you cannot knowingly assign students to an inferior pathway), and often infeasible at the granularity needed (you cannot randomize the order of every concept in a curriculum). Causal inference methods—instrumental variables, regression discontinuity, double machine learning, do-calculus—offer a path to causal claims from observational learning analytics data.

The convergence of these two technologies is recent and powerful. Sun (2025) explores this integration as a novel approach, demonstrating that knowledge graphs supply the structural assumptions (which concepts could causally affect which others) that causal inference algorithms require, while causal inference supplies the counterfactual reasoning (what would happen if a student skipped this concept?) that knowledge graphs cannot provide on their own.

Methods: How Causal Educational Knowledge Graphs Work

The technical pipeline, as described by Sun (2025) and Jin and Cui (2024), involves four stages:

Stage 1: Graph Construction. A knowledge graph is built from curriculum documents, expert input, and learning management system data. Nodes represent knowledge components (KCs)—discrete, assessable units of understanding. Edges represent prerequisite relationships, co-requisite relationships, and analogical mappings.

Stage 2: Behavioral Enrichment. Student interaction data—time-on-task, error patterns, help-seeking behavior, assessment performance—is overlaid on the knowledge graph. This creates what Jin and Cui call a "User Behavioural Knowledge Graph" (UBKG): the curriculum structure annotated with empirical evidence of how real students traverse it.

Stage 3: Causal Discovery. Causal inference algorithms (typically PC algorithm, GES, or NOTEARS for structure learning, and do-calculus or double machine learning for effect estimation) are applied to the UBKG to identify which prerequisite relationships are genuinely causal and which are merely correlational artifacts. For example, the observation that "students who study set theory before probability score higher" could reflect a causal prerequisite relationship or a selection effect (stronger students choose to study set theory first). Causal discovery disentangles these.

Stage 4: Counterfactual Pathway Optimization. Given the causal graph, the system can answer counterfactual questions: "If this student had studied concept A before concept B, would their outcome on concept C have been different?" These counterfactual estimates enable the optimization of learning pathways that are not merely correlated with good outcomes but causally productive of them.

The Robustness Problem

Huang and Vidal (2026) address a critical limitation of this pipeline: knowledge graphs in the real world are incomplete, noisy, and subject to the Open World Assumption (the absence of an edge does not mean the absence of a relationship). Their "Joint Graph Learning" framework simultaneously infers missing graph structure and estimates causal effects, handling both missing data and interference effects (where one student's learning trajectory influences another's through peer interaction).

The technical contribution is substantial—they demonstrate robustness to significant proportions of missing edges in the knowledge graph—but the educational implication is even more important. It means that causal educational knowledge graphs do not require perfect curriculum models to be useful. They can work with the imperfect, partially-specified curriculum descriptions that characterize most real educational settings, especially in under-resourced contexts where detailed curriculum maps do not exist.

Claims and Evidence

Claim	Evidence	Verdict
Knowledge graph + causal inference improves learning pathway recommendations over correlation-based methods	Sun (2025): improved prediction over correlation-based approaches; single study	⚠️ Uncertain (promising but unreplicated)
Causal discovery can identify genuine prerequisite relationships from observational data	Jin & Cui (2024): SAIERec system outperforms existing recommendation models on multiple datasets using counterfactual causal inference	✅ Supported
Counterfactual pathway optimization generalizes across student populations	No cross-population validation studies published	⚠️ Uncertain
Incomplete knowledge graphs can support robust causal inference	Huang & Vidal (2026): robust performance in incomplete and relationally complex KGs	✅ Supported
Causal educational AI reduces fairness concerns compared to correlation-based systems	Theoretical argument (causal mechanisms are more universal than correlations); no empirical test	⚠️ Uncertain

The Fairness Connection

The intersection of causal inference and educational fairness deserves particular attention. As Chinta, Wang, and Yin (2024) document extensively, correlation-based educational AI systems encode and amplify historical biases: if students from disadvantaged backgrounds have historically been tracked into lower-level courses, a correlation-based system will learn to recommend lower-level content to similar students, perpetuating the cycle.

Causal inference offers a potential escape from this trap. A causal model can distinguish between the effect of the learning pathway on the outcome and the effect of background characteristics on both the pathway chosen and the outcome. By intervening on the pathway (the do-operator) while controlling for background, causal models can identify the learning sequence that would be optimal for each student regardless of their demographic category—a fundamentally different optimization target than "recommend the pathway that similar students have historically taken."

This is theoretically compelling. But the practical barriers are formidable. Causal inference requires assumptions—no unmeasured confounding, positivity, consistency—that are difficult to verify in educational settings. And the very concept of "demographic category" in causal models is contested: race, gender, and socioeconomic status are not interventions that can be randomly assigned, raising deep questions about what "the causal effect of being female on learning outcomes" even means.

Open Questions

Can causal educational knowledge graphs scale beyond STEM? Current implementations focus on mathematics and computer science, where prerequisite structures are relatively clear. Humanities, social sciences, and arts have more fluid, less hierarchical knowledge structures. Can causal discovery handle this ambiguity?

How do we validate causal claims without RCTs? Sensitivity analysis and natural experiments provide partial answers, but the field needs agreed-upon standards for when observational causal inference is "strong enough" to guide educational practice.

What is the role of student agency? Current systems optimize learning pathways for students. Can causal knowledge graphs instead be used to help students understand the causal structure of their own learning—empowering them to make informed choices about their educational trajectories?

How do peer effects interact with individual pathways? Education is fundamentally social. A learning pathway that is causally optimal for an isolated individual may be suboptimal in a classroom where peer explanation, collaborative problem-solving, and social motivation shape outcomes. Incorporating interference effects into causal educational models remains an open technical challenge.

Can we build self-updating causal knowledge graphs? As students learn, the curriculum evolves, and pedagogical research advances, the knowledge graph must adapt. Can causal discovery algorithms operate in a streaming, online mode, continuously refining their understanding of educational causal structure?

Implications

The integration of knowledge graphs and causal inference represents a paradigm shift in educational AI—from systems that learn patterns in student data to systems that learn mechanisms of student learning. This shift has implications beyond technical performance:

For curriculum designers: causal knowledge graphs provide an empirical tool for evaluating curriculum structure. If causal analysis reveals that a widely-assumed prerequisite relationship is actually correlational, curricula can be restructured to eliminate unnecessary bottlenecks.

For educational researchers: causal inference on knowledge graphs offers a middle path between the rigor of RCTs and the scalability of observational studies. It will not replace experimental research, but it can generate causal hypotheses that prioritize which experiments are worth running.

For students: a particularly promising possibility is making causal structure visible. Imagine a student who can see not just "what to study next" but why—the causal pathways through which mastering this concept will enable future learning. This metacognitive transparency could transform passive consumers of recommended content into active navigators of their own intellectual development.

The field is early. The evidence base is thin. But the conceptual advance—from correlation to causation in educational AI—is exactly the kind of paradigm shift that transforms what is possible.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계, 주장은 원문 논문과 대조하여 검증해야 한다.

교육에서의 지식 그래프와 인과 추론의 만남: 상관관계 중심 학습을 넘어서

교육 기술 연구자라면 누구나 한 번쯤 고민해 봐야 할 질문이 있다. 적응형 학습 플랫폼이 학생에게 "이차함수" 전에 "일차방정식"을 공부하도록 권장할 때, 그 시스템은 이 순서가 왜 효과적인지 알고 있는 것인가, 아니면 단순히 이 순서를 따른 학생들이 기말 시험에서 더 높은 점수를 받았다는 사실을 관찰한 것에 불과한가?

이 차이는 결코 사소한 문제가 아니다. "일차방정식 먼저"를 권장하는 상관관계 기반 시스템은 해당 순서가 더 높은 점수와 상관관계가 있다는 이유만으로 작동하기 때문에, 그 상관관계가 성립하지 않는 맥락—다른 교육과정 구조, 다른 학생 집단, 다른 평가 도구—에 적용될 때 치명적으로 실패한다. 반면, 일차방정식이 이차 추론을 왜 인과적으로 가능하게 하는지 이해하는 인과 시스템은 표면적 통계가 달라지더라도 인과 메커니즘이 유지되기 때문에 일반화가 가능하다.

이것이 바로 교육에서 지식 그래프와 인과 추론을 통합하는 것의 가치이다. 즉, 학습 결과를 예측하는 시스템에서 학습 메커니즘을 설명하는 시스템으로의 전환이다. 아직 초기 단계이지만, 이러한 전환이 교육 AI에 미치는 영향은 행동주의 모델에서 인지 모델로의 전환이 교수 설계에 미쳤던 영향만큼 중대할 수 있다.

전반적 동향: 두 기술의 수렴

교육 분야에서의 지식 그래프는 교육과정을 개념(노드)과 선수 관계(엣지)로 구성된 구조화된 네트워크로 표현한다. 평면적 분류 체계나 선형적 교수요목과 달리, 지식 그래프는 지식의 풍부하고 비선형적인 의존 구조를 포착한다. 예를 들어, "확률"을 이해하려면 "통계"뿐만 아니라 "집합론," "조합론," 그리고 (종종 간과되는) "비례 추론"도 필요하다. 교육용 지식 그래프는 수학(KnowEdu), 컴퓨터 과학(CS Knowledge Graph), 의학 교육(MedEdKG) 분야에서 전문가 주석 또는 교과서와 학습 관리 시스템(LMS) 로그로부터의 자동 추출을 통해 구축되어 왔다.

인과 추론은 관찰 데이터에서 상관관계와 인과관계를 구분하기 위한 통계적 방법론이다. 교육에서 인과 주장의 표준은 무작위 대조 시험(RCT)으로, 학생들을 서로 다른 학습 경로에 무작위로 배정하고 결과를 비교한다. 그러나 교육 분야의 RCT는 비용이 많이 들고, 윤리적 제약이 있으며(학생들을 의도적으로 열등한 학습 경로에 배정할 수 없음), 필요한 수준의 세밀함에서는 실행 자체가 불가능한 경우가 많다(교육과정의 모든 개념 순서를 무작위화할 수 없음). 도구 변수, 회귀 불연속, 이중 기계 학습(double machine learning), do-계산법(do-calculus)과 같은 인과 추론 방법들은 관찰된 학습 분석 데이터로부터 인과 주장을 도출하는 경로를 제공한다.

이 두 기술의 수렴은 최근의 일이며 강력한 잠재력을 지닌다. Sun(2025)은 이 통합을 새로운 접근 방식으로 탐구하며, 지식 그래프가 인과 추론 알고리즘에 필요한 구조적 가정(어떤 개념이 다른 개념에 인과적으로 영향을 미칠 수 있는지)을 제공하는 반면, 인과 추론은 지식 그래프만으로는 제공할 수 없는 반사실적 추론(학생이 특정 개념을 건너뛰면 어떻게 될까?)을 제공한다는 점을 보여 준다.

방법론: 인과적 교육 지식 그래프의 작동 방식

Sun(2025)과 Jin and Cui(2024)가 기술한 기술적 파이프라인은 네 단계로 이루어진다.

1단계: 그래프 구성. 교육과정 문서, 전문가 입력, LMS 데이터로부터 지식 그래프를 구축한다. 노드는 지식 구성요소(KC, Knowledge Component)—이산적이고 평가 가능한 이해 단위—를 나타낸다. 엣지는 선수 관계, 공동 이수 관계, 유추적 대응 관계를 표현한다. 2단계: 행동 강화(Behavioral Enrichment). 과제 수행 시간, 오류 패턴, 도움 요청 행동, 평가 성과 등 학생 상호작용 데이터가 지식 그래프 위에 덧씌워진다. 이를 통해 Jin과 Cui가 "사용자 행동 지식 그래프(User Behavioural Knowledge Graph, UBKG)"라고 부르는 것이 생성된다. 즉, 실제 학생들이 교육과정을 어떻게 이수하는지에 대한 경험적 증거로 주석이 달린 교육과정 구조이다.

3단계: 인과 발견(Causal Discovery). UBKG에 인과 추론 알고리즘—구조 학습에는 일반적으로 PC 알고리즘, GES, 또는 NOTEARS가 사용되며, 효과 추정에는 do-calculus 또는 이중 기계 학습(double machine learning)이 사용된다—을 적용하여 어떤 선수 관계가 진정한 인과 관계이고 어떤 것이 단순한 상관관계의 산물인지를 식별한다. 예를 들어, "집합론을 확률보다 먼저 공부한 학생들이 더 높은 점수를 받는다"는 관찰은 인과적 선수 관계를 반영할 수도 있고, 선택 효과(우수한 학생들이 먼저 집합론을 선택하는 경향)를 반영할 수도 있다. 인과 발견은 이 두 가지를 분리해낸다.

4단계: 반사실적 경로 최적화(Counterfactual Pathway Optimization). 인과 그래프가 주어지면, 시스템은 반사실적 질문에 답할 수 있다. "이 학생이 개념 B 이전에 개념 A를 공부했더라면, 개념 C에 대한 결과가 달라졌을까?" 이러한 반사실적 추정은 단순히 좋은 결과와 상관관계가 있는 것이 아니라 인과적으로 그 결과를 산출하는 학습 경로의 최적화를 가능하게 한다.

강건성 문제

Huang과 Vidal(2026)은 이 파이프라인의 중요한 한계를 다룬다. 현실 세계의 지식 그래프는 불완전하고 잡음이 많으며, 개방 세계 가정(Open World Assumption)—간선의 부재가 관계의 부재를 의미하지 않는다는 가정—의 영향을 받는다. 이들의 "결합 그래프 학습(Joint Graph Learning)" 프레임워크는 누락된 그래프 구조를 추론하는 동시에 인과 효과를 추정하며, 결측 데이터와 간섭 효과—한 학생의 학습 궤적이 동료 상호작용을 통해 다른 학생에게 영향을 미치는 현상—를 모두 처리한다.

기술적 기여는 상당하다—이들은 지식 그래프에서 상당한 비율의 간선이 누락된 경우에도 강건성을 입증한다—그러나 교육적 함의는 더욱 중요하다. 이는 인과적 교육 지식 그래프가 유용하게 기능하기 위해 완벽한 교육과정 모델을 필요로 하지 않음을 의미한다. 대부분의 실제 교육 환경, 특히 상세한 교육과정 지도가 존재하지 않는 자원 부족 맥락에서 나타나는 불완전하고 부분적으로만 명세된 교육과정 설명으로도 작동할 수 있다.

주장과 근거

주장	근거	판정
지식 그래프와 인과 추론의 결합이 상관관계 기반 방법보다 학습 경로 추천을 개선한다	Sun(2025): 상관관계 기반 접근법 대비 예측력 향상; 단일 연구	⚠️ 불확실(유망하나 미복제)
인과 발견이 관찰 데이터로부터 진정한 선수 관계를 식별할 수 있다	Jin & Cui(2024): SAIERec 시스템이 반사실적 인과 추론을 사용하여 다중 데이터셋에서 기존 추천 모델을 능가	✅ 지지됨
반사실적 경로 최적화가 학생 집단 전반에 걸쳐 일반화된다	교차 집단 검증 연구 없음	⚠️ 불확실
불완전한 지식 그래프가 강건한 인과 추론을 지원할 수 있다	Huang & Vidal(2026): 불완전하고 관계적으로 복잡한 KG에서 강건한 성능	✅ 지지됨
인과적 교육 AI가 상관관계 기반 시스템 대비 공정성 우려를 감소시킨다	이론적 논거(인과 메커니즘은 상관관계보다 보편적이다); 경험적 검증 없음	⚠️ 불확실

공정성과의 연결

인과 추론과 교육적 공정성의 교차점은 특별한 주의를 기울일 필요가 있다. Chinta, Wang, Yin(2024)이 광범위하게 기술하듯, 상관관계 기반 교육 AI 시스템은 역사적 편향을 내재화하고 증폭시킨다. 불이익을 받는 배경의 학생들이 역사적으로 낮은 수준의 과정에 배치되어 왔다면, 상관관계 기반 시스템은 유사한 학생들에게 낮은 수준의 콘텐츠를 추천하도록 학습하여 그 순환을 영속시킨다. 인과 추론은 이 함정에서 벗어날 수 있는 잠재적 출구를 제공한다. 인과 모델은 학습 경로가 결과에 미치는 효과와 배경 특성이 선택된 경로와 결과 모두에 미치는 효과를 구별할 수 있다. 배경을 통제하면서 경로에 개입(do-연산자)함으로써, 인과 모델은 "유사한 학생들이 역사적으로 걸어온 경로를 추천한다"는 것과는 근본적으로 다른 최적화 목표인, 인구통계학적 범주에 관계없이 각 학생에게 최적인 학습 순서를 식별할 수 있다.

이는 이론적으로 설득력이 있다. 그러나 현실적 장벽은 만만치 않다. 인과 추론은 교육 환경에서 검증하기 어려운 가정들—측정되지 않은 교란 변수 없음(no unmeasured confounding), 양성성(positivity), 일관성(consistency)—을 필요로 한다. 더불어 인과 모델에서 "인구통계학적 범주"라는 개념 자체도 논쟁의 여지가 있다. 인종, 성별, 사회경제적 지위는 무작위로 배정될 수 있는 개입이 아니므로, "여성임의 학습 결과에 대한 인과 효과"가 무엇을 의미하는지에 대한 심층적인 물음이 제기된다.

미해결 과제

인과적 교육 지식 그래프는 STEM을 넘어 확장될 수 있는가? 현재의 구현은 선수 구조가 비교적 명확한 수학과 컴퓨터 과학에 집중되어 있다. 인문학, 사회과학, 예술은 보다 유동적이고 덜 위계적인 지식 구조를 가진다. 인과 발견(causal discovery)은 이러한 모호성을 다룰 수 있는가?

RCT 없이 인과적 주장을 어떻게 검증하는가? 민감도 분석(sensitivity analysis)과 자연 실험(natural experiment)이 부분적인 답을 제공하지만, 관찰적 인과 추론이 교육 실천을 안내하기에 "충분히 강력한" 시점에 대한 합의된 기준이 이 분야에 필요하다.

학생 주체성의 역할은 무엇인가? 현재 시스템은 학생들을 위해 학습 경로를 최적화한다. 인과 지식 그래프를 학생들이 자신의 학습에 대한 인과 구조를 이해하도록 돕는 데 활용함으로써, 교육 궤적에 관한 정보에 입각한 선택을 내릴 수 있도록 역량을 강화할 수는 없는가?

동료 효과는 개인 경로와 어떻게 상호작용하는가? 교육은 근본적으로 사회적이다. 고립된 개인에게 인과적으로 최적인 학습 경로가, 동료 설명, 협력적 문제 해결, 사회적 동기가 결과를 형성하는 교실에서는 차선책이 될 수 있다. 인과적 교육 모델에 간섭 효과(interference effect)를 통합하는 것은 여전히 미해결된 기술적 과제로 남아 있다.

자기 갱신(self-updating) 인과 지식 그래프를 구축할 수 있는가? 학생들이 학습하고, 교육과정이 발전하며, 교수법 연구가 진전됨에 따라 지식 그래프는 적응해야 한다. 인과 발견 알고리즘은 스트리밍 방식의 온라인 모드로 작동하면서 교육 인과 구조에 대한 이해를 지속적으로 정교화할 수 있는가?

시사점

지식 그래프와 인과 추론의 통합은 교육 AI에서 하나의 패러다임 전환을 나타낸다—학생 데이터의 패턴을 학습하는 시스템에서 학생 학습의 메커니즘을 학습하는 시스템으로의 전환. 이 전환은 기술적 성능을 넘어서는 함의를 지닌다.

교육과정 설계자에게: 인과 지식 그래프는 교육과정 구조를 평가하기 위한 실증적 도구를 제공한다. 인과 분석을 통해 널리 가정되어 온 선수 관계가 실제로는 상관관계에 불과함이 드러난다면, 불필요한 병목을 제거하기 위해 교육과정을 재구성할 수 있다.

교육 연구자에게: 지식 그래프에 대한 인과 추론은 RCT의 엄밀성과 관찰 연구의 확장성 사이의 중간 경로를 제공한다. 이는 실험 연구를 대체하지는 않겠지만, 어떤 실험이 수행할 가치가 있는지를 우선순위화하는 인과 가설을 생성할 수 있다.

학생에게: 특히 유망한 가능성은 인과 구조를 가시화하는 것이다. "다음에 무엇을 공부해야 하는지"뿐만 아니라 왜—즉 이 개념을 숙달함으로써 미래 학습이 가능해지는 인과 경로—를 볼 수 있는 학생을 상상해보라. 이러한 메타인지적 투명성은 추천 콘텐츠의 수동적 소비자를 자신의 지적 발달의 능동적 항해자로 변화시킬 수 있다. 이 분야는 아직 초기 단계이다. 근거 기반은 빈약하다. 그러나 교육 AI에서 상관관계에서 인과관계로의 개념적 진전은 가능한 것을 변화시키는 바로 그러한 종류의 패러다임 전환이다.

References (4)

[1] Sun, L. (2025). Integrating Knowledge Graphs and Causal Inference for AI-Driven Personalized Learning in Education. AI Education Science & Engineering, 1(1).

DOI Scholar

[2] Jin, S. & Cui, L. (2024). A Context-Aware Intelligent Educational Recommender System Incorporating User Behavioural Knowledge Graph and Causal Inference. Proc. IEEE ICCVIT 2024.

DOI Scholar

[3] Huang, H. & Vidal, M.-E. (2026). Joint Graph Learning for Robust Causal Inference over Knowledge Graphs. Proc. ACM Web Conference 2026.

DOI Scholar

[4] Chinta, S.V., Wang, Z., Yin, Z., Hoang, N., Gonzalez, M., Le Quy, T., & Zhang, W. (2024). FairAIED: Navigating Fairness, Bias, and Ethics in Educational AI Applications. arXiv:2407.18745.

DOI Scholar

Knowledge Graphs Meet Causal Inference in Education: Beyond Correlation-Driven Learning

The Landscape: Two Technologies, One Convergence

Methods: How Causal Educational Knowledge Graphs Work

The Robustness Problem

Claims and Evidence

The Fairness Connection

Open Questions

Implications

교육에서의 지식 그래프와 인과 추론의 만남: 상관관계 중심 학습을 넘어서

전반적 동향: 두 기술의 수렴

방법론: 인과적 교육 지식 그래프의 작동 방식

강건성 문제

주장과 근거

공정성과의 연결

미해결 과제

시사점

References (4)

Explore this topic deeper