EducationSystematic Review

Predicting MOOC Dropout with Deep Learning: Solving the Wrong Problem?

Deep learning models can now predict MOOC dropout with over 90% accuracy. Yet completion rates remain stubbornly low. A key tension: the field has become very effective at predicting failure without becoming comparably better at preventing it. Five key papers reveal why prediction and intervention remain decoupled.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

The MOOC prediction literature has achieved something notable and, upon reflection, somewhat paradoxical. Over the past five years, researchers have developed increasingly sophisticated deep learning architectures—convolutional networks, recurrent networks, attention mechanisms, transformer models, and hybrid systems combining all of the above—that can predict with impressive accuracy which students will drop out of a Massive Open Online Course. The models are elegant. The feature engineering is inventive. The benchmark comparisons are rigorous. And yet, the dropout rates that motivated this entire research program have barely moved.

This is the central paradox of MOOC learning analytics in 2025: we can predict failure with exquisite precision, but we remain largely unable to prevent it. The question that the field has been reluctant to ask is whether prediction itself—divorced from actionable, effective intervention—constitutes a meaningful contribution to educational practice, or whether it has become an end in itself, a technically satisfying exercise that produces papers without producing learning.

The Systematic Evidence: What Five Years of Deep Learning Have Produced

Rizwan, Nee, and Garfan (2025) provide a comprehensive systematic literature review, synthesizing research from 2019 to 2024 on deep learning approaches to MOOC performance and engagement prediction. Published in IEEE Access and already cited over 30 times, this review maps the entire landscape of architectures, features, and evaluation methodologies that the field has explored.

Their synthesis reveals several consistent patterns across the literature:

Architectural convergence: The field has progressively moved from traditional machine learning (random forests, SVMs, logistic regression) through basic neural networks (MLPs) to sequential models (LSTMs, GRUs) and, most recently, to attention-based architectures (transformers, self-attention CNNs). Each generation of models has produced incremental improvements in prediction accuracy on benchmark datasets—typically moving from the low 80s to the low 90s in AUC-ROC.

Feature evolution: Early models relied on simple aggregate features—total login count, number of videos watched, quiz scores. Current models ingest behavioral sequences: the temporal ordering of clicks, the duration patterns of engagement sessions, the rhythm of forum participation. This shift from summary statistics to sequential data has been the single largest driver of prediction improvements.

Evaluation narrowness: The overwhelming majority of studies evaluate prediction performance using standard classification metrics (accuracy, AUC, F1-score) on held-out test data from the same MOOC offering. Cross-course generalization, cross-platform transfer, and—most critically—the relationship between prediction accuracy and intervention effectiveness are rarely assessed.

Attention to the Right Things: Novel Architectures

Two recent papers illustrate the architectural frontier. Fazil, Rísquez, and Halpin (2024), in the Journal of Learning Analytics, introduce ASIST—an Attention-aware convolutional Stacked BiLSTM network. The architecture processes students' VLE (Virtual Learning Environment) interaction sequences through three stages: convolutional layers extract local behavioral patterns, stacked bidirectional LSTMs capture long-range temporal dependencies, and an attention mechanism learns to weight the most predictively relevant time periods and activity types.

The attention component is particularly revealing. By examining which features receive the highest attention weights, the model provides interpretable signals about which behavioral indicators are most predictive of student outcomes. The ablation analysis reveals that weekly event count has the greatest impact on ASIST's performance, while diurnal weekly interaction patterns have the least impact. The model achieves an AUC of 0.86 to 0.90 across three datasets, with early prediction using just the first seven weeks achieving an AUC of 0.83 to 0.89.

Liu, Xu, and Yang (2025) take a more mathematically ambitious approach, incorporating Lie group features into a dilated convolutional attention network. Lie groups—continuous symmetry groups from differential geometry—are used to represent the inherent symmetries in student behavioral sequences: the idea that certain behavioral patterns (e.g., "engage intensely then disappear for a week") have the same predictive meaning regardless of when in the course they occur. This temporal invariance, formally modeled through Lie group transformations, enables the model to generalize across courses with different temporal structures.

The technical contribution is genuine, but it exemplifies the field's tendency toward increasing mathematical sophistication in pursuit of marginal prediction improvements, while the fundamental question—"What do we do with these predictions?"—remains unaddressed.

The Digital Traces Approach: Clustering Before Predicting

Pecuchová and Drlík (2024) propose a methodologically distinct approach. Rather than training end-to-end deep learning models on raw behavioral data, they first apply clustering analysis to students' digital traces—login patterns, resource access sequences, assignment submission timing—to identify distinct behavioral archetypes. These clusters are then used as features for dropout prediction, creating a two-stage pipeline: understand behavioral types, then predict outcomes for each type.

This approach yields two benefits that pure deep learning models miss. First, the clusters are interpretable: an educator can understand what "Type 3 student: binge-watches lectures before deadlines, skips forums, submits assignments in final hour" means and can design targeted interventions for that behavioral profile. Second, the clustering reveals that dropout is not a monolithic phenomenon—different students drop out for different reasons at different times, and a single prediction model that treats all dropout as equivalent is losing actionable information.

Their analysis identifies distinct student clusters, including two high-risk clusters that demonstrate the highest dropout rates. The clustering reveals meaningful behavioral differences between groups based on their LMS interaction patterns. Using BIRCH, DBSCAN, and GMM algorithms, they find that BIRCH most effectively categorizes students by activity patterns. Critically, the study confirms that early identification of at-risk students using clustering is feasible through temporal analysis, and that different clusters may require different intervention strategies—a nuance that single-model "will dropout: yes/no" predictions obscure.

The Generative AI Disruption

Rodriguez-Ortiz, Santana-Mancilla, and Anido-Rifón (2025) contribute a systematic review of 101 empirical studies examining how machine learning and generative AI have been integrated into learning analytics in higher education. This PRISMA-compliant review reveals an emerging tension:

Generative AI is being deployed in learning analytics for two fundamentally different purposes: prediction (using GenAI to improve the accuracy of at-risk identification, through more sophisticated feature extraction or synthetic data augmentation) and intervention (using GenAI to deliver personalized nudges, adaptive feedback, and motivational messages to at-risk students).

The prediction applications are further along. The intervention applications are nascent but conceptually promising: if an LLM can generate personalized, contextually appropriate messages to at-risk students—"I notice you haven't accessed Module 4 yet. Students who found Module 3 challenging often benefit from reviewing the worked examples before moving on"—then the prediction-intervention gap could, in principle, be closed. But the review finds almost no rigorous evaluation of whether GenAI-generated interventions actually improve retention. The field is building intervention tools before establishing whether the interventions work.

Claims and Evidence

Claim	Evidence	Verdict
Deep learning outperforms traditional ML for MOOC dropout prediction	Rizwan et al. (2025): consistent improvements in AUC across the review, particularly for sequential models	✅ Supported
Attention-based models provide interpretable predictions	Fazil et al. (2024): attention maps reveal predictively relevant time periods; Liu et al. (2025): Lie group temporal invariance	✅ Supported
Dropout is a heterogeneous phenomenon requiring differentiated intervention	Pecuchová & Drlík (2024): distinct behavioral archetypes with different dropout trajectories	✅ Supported
Better prediction leads to better student outcomes	No study in any of these reviews demonstrates a causal link between prediction accuracy and student retention	❌ Refuted
GenAI interventions improve MOOC retention	Rodriguez-Ortiz et al. (2025): no rigorous evaluation found in 101-study review	⚠️ Uncertain

The Uncomfortable Truth: Prediction Is Not Intervention

The gap between prediction and intervention is not merely a research lag—it reflects a structural disconnection between the communities that build models and the communities that design learning experiences. Machine learning researchers optimize for AUC-ROC. Instructional designers optimize for learner experience. Platform engineers optimize for scalability. These communities publish in different venues, cite different literatures, and operate under different incentive structures.

A model that predicts dropout with 95% accuracy is useless if:

The MOOC platform has no mechanism to act on predictions in real time
The course design cannot be modified mid-offering based on prediction output
The interventions available (email nudges, pop-up notifications) are too weak to alter behavior
The reasons for dropout (job change, life event, misaligned expectations) are beyond the educational system's control

This last point deserves particular emphasis. Much MOOC dropout is not a failure of the educational experience but a rational response to changed circumstances. A professional who enrolled to learn Python, acquired sufficient skill after three modules, and stopped without completing the certificate has not "failed"—they have achieved their learning goal. Treating this learner as a "dropout" to be predicted and prevented reveals the implicit assumption of the prediction literature: that completion equals success and non-completion equals failure. This assumption is not only empirically questionable; it is pedagogically regressive, importing a credentialist logic into a medium whose original promise was to liberate learning from institutional gatekeeping.

Open Questions

Should we shift from dropout prediction to learning outcome prediction? Rather than predicting who will leave, can we predict who is learning—and design interventions for students who stay but stagnate?

What is the minimum effective intervention? The intervention literature suggests that the most effective nudges are specific, timely, and actionable. Can AI systems generate such interventions automatically, and at what quality threshold do they become effective?

How do we validate interventions ethically? Randomized trials of dropout interventions require a control group that receives no intervention—students who are identified as at-risk and deliberately not helped. Is this ethically defensible?

Can prediction models transfer across platforms? Models trained on Coursera data may not generalize to edX, FutureLearn, or K-MOOC. Cross-platform validation remains rare and results are discouraging.

What would a prediction-native MOOC look like? Rather than bolting prediction onto existing MOOC designs, what if course architecture was designed from the ground up to be responsive to real-time engagement analytics—with modular content, adaptive pacing, and built-in intervention points?

Implications

The field of MOOC learning analytics stands at a crossroads. One path leads to ever-more-sophisticated prediction models that squeeze marginal accuracy gains from increasingly complex architectures—a path that produces publications but not impact. The other path leads to the harder, messier work of designing, deploying, and rigorously evaluating interventions that translate predictions into improved learning outcomes.

The evidence reviewed here suggests that the second path requires not just better models but better institutional infrastructure: platforms that can act on predictions in real time, instructional designs that accommodate adaptive modification, and evaluation frameworks that measure learning rather than completion. Until these infrastructure gaps are addressed, even a highly accurate prediction model will remain an elegant solution to the wrong problem.

The researchers who will advance this field are not those who achieve the highest AUC, but those who close the loop between knowing who will fail and helping them succeed.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 특정 연구 결과, 통계 및 주장은 학술 연구에서 인용하기 전에 원본 논문을 통해 검증해야 한다.

딥러닝으로 MOOC 중도 탈락 예측하기: 잘못된 문제를 풀고 있는 것인가?

MOOC 예측 관련 문헌은 주목할 만한, 그리고 돌이켜보면 다소 역설적인 성과를 달성했다. 지난 5년간 연구자들은 점점 더 정교한 딥러닝 아키텍처—합성곱 신경망(convolutional network), 순환 신경망(recurrent network), 어텐션 메커니즘(attention mechanism), 트랜스포머(transformer) 모델, 그리고 이 모두를 결합한 하이브리드 시스템—를 개발하여, 대규모 공개 온라인 강좌(Massive Open Online Course)에서 어떤 학생이 중도 탈락할지를 인상적인 정확도로 예측할 수 있게 되었다. 모델은 우아하다. 특징 공학(feature engineering)은 창의적이다. 벤치마크 비교는 엄밀하다. 그럼에도 불구하고, 이 모든 연구 프로그램을 촉발시킨 중도 탈락률은 거의 변하지 않았다.

이것이 2025년 MOOC 학습 분석(learning analytics)의 핵심 역설이다: 우리는 실패를 정교하게 예측할 수 있지만, 이를 예방하는 데는 여전히 크게 무력하다. 이 분야가 묻기를 꺼려온 질문은, 실행 가능하고 효과적인 개입(intervention)과 분리된 예측 그 자체가 교육 실천에 의미 있는 기여를 구성하는지, 아니면 그 자체가 목적이 되어버린 것은 아닌지—즉, 학습을 만들어내지 않으면서 논문만 만들어내는 기술적으로 만족스러운 연습이 되어버린 것은 아닌지—하는 것이다.

체계적 증거: 5년간의 딥러닝이 만들어낸 것

Rizwan, Nee, Garfan(2025)은 MOOC 성취도 및 참여 예측에 대한 딥러닝 접근법을 다룬 2019년부터 2024년까지의 연구를 종합한 포괄적인 체계적 문헌 고찰(systematic literature review)을 제공한다. IEEE Access에 게재되어 이미 30회 이상 인용된 이 리뷰는 이 분야가 탐구해온 아키텍처, 특징, 평가 방법론의 전체 지형을 지도화한다.

이들의 종합은 문헌 전반에 걸쳐 몇 가지 일관된 패턴을 드러낸다:

아키텍처의 수렴: 이 분야는 전통적인 기계 학습(랜덤 포레스트(random forest), SVM, 로지스틱 회귀(logistic regression))에서 기본 신경망(MLP)을 거쳐 순차적 모델(LSTM, GRU)로, 그리고 가장 최근에는 어텐션 기반 아키텍처(트랜스포머, 셀프-어텐션 CNN)로 점진적으로 이동했다. 각 세대의 모델은 벤치마크 데이터셋에서의 예측 정확도를 점진적으로 개선해왔는데—일반적으로 AUC-ROC 기준 80년대 초반에서 90년대 초반으로 향상되었다.

특징의 진화: 초기 모델들은 단순한 집계 특징—총 로그인 횟수, 시청한 동영상 수, 퀴즈 점수—에 의존했다. 현재의 모델들은 행동 시퀀스를 입력으로 받는다: 클릭의 시간적 순서, 참여 세션의 지속 시간 패턴, 포럼 참여의 리듬. 요약 통계에서 순차 데이터로의 이 전환이 예측 성능 향상의 단일 최대 동인이었다.

평가의 협소성: 연구의 압도적 다수는 동일한 MOOC 제공 데이터의 테스트 세트를 사용한 표준 분류 지표(정확도, AUC, F1-score)를 통해 예측 성능을 평가한다. 강좌 간 일반화, 플랫폼 간 전이, 그리고—가장 결정적으로—예측 정확도와 개입 효과성 간의 관계는 거의 평가되지 않는다.

올바른 대상에 대한 어텐션: 새로운 아키텍처

두 편의 최근 논문이 아키텍처의 최전선을 보여준다. Fazil, Rísquez, Halpin(2024)은 Journal of Learning Analytics에서 ASIST—어텐션 인식 합성곱 스택형 양방향 LSTM(Attention-aware convolutional Stacked BiLSTM) 네트워크—를 소개한다. 이 아키텍처는 학생들의 VLE(Virtual Learning Environment) 상호작용 시퀀스를 세 단계로 처리한다: 합성곱 레이어가 국소적 행동 패턴을 추출하고, 스택형 양방향 LSTM이 장거리 시간적 의존성을 포착하며, 어텐션 메커니즘이 예측적으로 가장 관련성이 높은 시간 구간과 활동 유형에 가중치를 부여하는 법을 학습한다. 주의 구성 요소는 특히 시사하는 바가 크다. 어떤 특징이 가장 높은 주의 가중치를 받는지를 검토함으로써, 모델은 어떤 행동 지표가 학생 결과를 가장 잘 예측하는지에 대한 해석 가능한 신호를 제공한다. 절제 분석(ablation analysis)에 따르면 주간 이벤트 수가 ASIST의 성능에 가장 큰 영향을 미치는 반면, 일주기 주간 상호작용 패턴은 가장 적은 영향을 미치는 것으로 나타났다. 모델은 세 가지 데이터셋에서 0.86에서 0.90의 AUC를 달성하며, 처음 7주만을 사용한 조기 예측에서도 0.83에서 0.89의 AUC를 기록한다.

Liu, Xu, Yang(2025)은 더욱 수학적으로 야심 찬 접근 방식을 취하여, 리 군(Lie group) 특징을 확장 합성곱 주의 네트워크(dilated convolutional attention network)에 통합한다. 미분기하학의 연속 대칭 군인 리 군은 학생 행동 시퀀스에 내재된 대칭성을 표현하는 데 활용된다. 즉, 특정 행동 패턴(예: "집중적으로 참여하다가 일주일간 사라지는 것")이 강좌 내 발생 시점에 관계없이 동일한 예측적 의미를 가진다는 개념이다. 리 군 변환을 통해 형식적으로 모델링되는 이러한 시간적 불변성은 서로 다른 시간적 구조를 가진 강좌 전반에 걸쳐 모델이 일반화될 수 있도록 한다.

기술적 기여는 분명하지만, 이는 근본적인 질문—"이러한 예측을 가지고 우리는 무엇을 해야 하는가?"—이 여전히 다루어지지 않은 채로 남아 있는 가운데, 미미한 예측 향상을 위해 수학적 정교함을 끊임없이 높이려는 이 분야의 경향을 잘 보여 준다.

디지털 흔적 접근법: 예측 이전에 군집화하기

Pecuchová와 Drlík(2024)은 방법론적으로 차별화된 접근 방식을 제안한다. 원시 행동 데이터에 대해 end-to-end 딥러닝 모델을 훈련시키는 대신, 먼저 학생들의 디지털 흔적—로그인 패턴, 자원 접근 시퀀스, 과제 제출 타이밍—에 군집 분석을 적용하여 뚜렷한 행동 원형(behavioral archetype)을 식별한다. 이후 이러한 군집은 중도 탈락 예측의 특징으로 활용되며, 행동 유형 파악 후 각 유형별 결과 예측이라는 2단계 파이프라인을 구성한다.

이 접근 방식은 순수 딥러닝 모델이 놓치는 두 가지 이점을 제공한다. 첫째, 군집이 해석 가능하다. 교육자는 "유형 3 학생: 마감 전 강의를 몰아보고, 포럼은 건너뛰며, 마지막 시간에 과제를 제출"이 무엇을 의미하는지 이해할 수 있으며, 해당 행동 프로파일에 맞는 표적 개입을 설계할 수 있다. 둘째, 군집화를 통해 중도 탈락이 단일한 현상이 아님이 드러난다. 서로 다른 학생들이 서로 다른 이유로 서로 다른 시점에 탈락하며, 모든 중도 탈락을 동일하게 취급하는 단일 예측 모델은 실행 가능한 정보를 잃게 된다.

이들의 분석은 가장 높은 중도 탈락률을 보이는 고위험 군집 두 개를 포함하여 뚜렷한 학생 군집을 식별한다. 군집화는 LMS 상호작용 패턴을 기반으로 집단 간 의미 있는 행동 차이를 드러낸다. BIRCH, DBSCAN, GMM 알고리즘을 활용한 결과, BIRCH가 활동 패턴에 따라 학생을 가장 효과적으로 분류하는 것으로 나타났다. 중요한 점은, 이 연구가 군집화를 활용한 위험 학생의 조기 식별이 시계열 분석을 통해 실현 가능함을 확인하고, 서로 다른 군집이 서로 다른 개입 전략을 필요로 할 수 있음을 밝혔다는 것이다. 이는 단일 모델의 "중도 탈락 여부: 예/아니오" 예측이 가려버리는 미묘한 차이이다.

생성형 AI의 파급 효과

Rodriguez-Ortiz, Santana-Mancilla, Anido-Rifón(2025)은 머신러닝과 생성형 AI가 고등교육의 학습 분석에 어떻게 통합되었는지를 검토한 101편의 실증 연구에 대한 체계적 고찰을 제시한다. PRISMA를 준수한 이 고찰은 새로운 긴장 관계를 드러낸다.

생성형 AI는 근본적으로 서로 다른 두 가지 목적으로 학습 분석에 배치되고 있다. 예측(더욱 정교한 특징 추출이나 합성 데이터 증강을 통해 위험 학생 식별의 정확도를 향상시키기 위해 GenAI를 활용하는 것)과 개입(위험 학생에게 개인화된 알림, 적응형 피드백, 동기 부여 메시지를 제공하기 위해 GenAI를 활용하는 것)이 그것이다. 예측 애플리케이션은 더 발전되어 있다. 개입 애플리케이션은 아직 초기 단계이지만 개념적으로는 유망하다. LLM이 위험에 처한 학생들에게 개인화되고 맥락에 적합한 메시지를 생성할 수 있다면—"아직 모듈 4에 접근하지 않은 것을 알았습니다. 모듈 3이 어려웠던 학생들은 다음으로 넘어가기 전에 풀이 예제를 복습하면 도움이 되는 경우가 많습니다"—원칙적으로 예측과 개입 사이의 간극이 해소될 수 있다. 그러나 이 리뷰에서는 GenAI가 생성한 개입이 실제로 유지율을 향상시키는지에 대한 엄밀한 평가가 거의 발견되지 않는다. 이 분야는 개입이 효과가 있는지를 입증하기 전에 개입 도구를 먼저 구축하고 있다.

주장과 증거

주장	증거	판정
딥러닝이 MOOC 중도 탈락 예측에서 전통적 ML보다 우수하다	Rizwan et al. (2025): 리뷰 전반에 걸쳐 AUC가 일관되게 향상되며, 특히 순차 모델에서 두드러짐	✅ 지지됨
어텐션 기반 모델이 해석 가능한 예측을 제공한다	Fazil et al. (2024): 어텐션 맵이 예측적으로 관련성 있는 시간대를 드러냄; Liu et al. (2025): Lie group 시간적 불변성	✅ 지지됨
중도 탈락은 차별화된 개입을 요구하는 이질적 현상이다	Pecuchová & Drlík (2024): 서로 다른 중도 탈락 궤적을 가진 뚜렷한 행동 유형	✅ 지지됨
더 나은 예측이 더 나은 학습 성과로 이어진다	이 리뷰들 중 어떤 연구도 예측 정확도와 학생 유지율 사이의 인과관계를 입증하지 못함	❌ 반박됨
GenAI 개입이 MOOC 유지율을 향상시킨다	Rodriguez-Ortiz et al. (2025): 101개 연구 리뷰에서 엄밀한 평가를 찾을 수 없음	⚠️ 불확실

불편한 진실: 예측은 개입이 아니다

예측과 개입 사이의 간극은 단순한 연구의 시차가 아니라, 모델을 구축하는 집단과 학습 경험을 설계하는 집단 사이의 구조적 단절을 반영한다. 머신러닝 연구자들은 AUC-ROC를 최적화한다. 교수 설계자들은 학습자 경험을 최적화한다. 플랫폼 엔지니어들은 확장성을 최적화한다. 이 집단들은 서로 다른 학술지에 발표하고, 서로 다른 문헌을 인용하며, 서로 다른 인센티브 구조 하에서 활동한다.

95%의 정확도로 중도 탈락을 예측하는 모델도 다음과 같은 경우에는 무용지물이다:

MOOC 플랫폼에 예측 결과를 실시간으로 반영할 수 있는 메커니즘이 없는 경우
강좌 설계가 예측 결과에 기반하여 운영 도중 수정될 수 없는 경우
이용 가능한 개입 수단(이메일 알림, 팝업 알림)이 행동을 바꾸기에는 너무 미약한 경우
중도 탈락의 원인(직장 변경, 삶의 사건, 어긋난 기대)이 교육 시스템의 통제 범위를 벗어난 경우

마지막 요점은 특히 강조할 필요가 있다. MOOC 중도 탈락의 상당 부분은 교육 경험의 실패가 아니라 변화된 상황에 대한 합리적 반응이다. Python을 배우기 위해 등록하여 세 개의 모듈 이수 후 충분한 기술을 습득하고, 수료증을 취득하지 않은 채 중단한 전문직 종사자는 "실패"한 것이 아니라 자신의 학습 목표를 달성한 것이다. 이 학습자를 예측하고 방지해야 할 "중도 탈락자"로 취급하는 것은 예측 문헌의 암묵적 가정, 즉 수료는 성공이고 미수료는 실패라는 관점을 드러낸다. 이 가정은 경험적으로 의문스러울 뿐만 아니라, 교육학적으로도 퇴행적이다. 학습을 제도적 관문으로부터 해방시키는 것이 원래의 약속이었던 매체에 학력주의적 논리를 이식하는 것이기 때문이다.

미해결 질문

중도 탈락 예측에서 학습 성과 예측으로 전환해야 하는가? 누가 떠날 것인지를 예측하는 대신, 누가 학습하고 있는지를 예측하고—머물러 있지만 정체된 학생들을 위한 개입을 설계할 수 있는가?

최소한의 효과적인 개입은 무엇인가? 개입 문헌에 따르면 가장 효과적인 넛지는 구체적이고, 시의적절하며, 실행 가능한 것이다. AI 시스템이 그러한 개입을 자동으로 생성할 수 있는가, 그리고 어떤 품질 기준에서 그것이 효과적이 되는가?

개입을 윤리적으로 어떻게 검증하는가? 중도탈락 개입에 대한 무작위 대조 시험은 아무런 개입도 받지 않는 대조군, 즉 위험군으로 식별되었음에도 의도적으로 도움을 받지 못하는 학습자를 필요로 한다. 이것이 윤리적으로 정당화될 수 있는가?

예측 모델은 플랫폼 간에 전이될 수 있는가? Coursera 데이터로 훈련된 모델은 edX, FutureLearn, 또는 K-MOOC에 일반화되지 않을 수 있다. 플랫폼 간 검증은 여전히 드물며, 그 결과 또한 실망스럽다.

예측 중심으로 설계된 MOOC는 어떤 모습일까? 기존 MOOC 설계에 예측 기능을 덧붙이는 방식 대신, 모듈형 콘텐츠, 적응적 학습 속도 조절, 내재된 개입 지점을 갖추어 실시간 참여 분석에 반응할 수 있도록 강좌 구조 자체를 처음부터 설계한다면 어떨까?

시사점

MOOC 학습 분석 분야는 기로에 서 있다. 한 방향은 점점 더 복잡한 아키텍처로부터 미미한 정확도 향상을 쥐어짜내는, 더욱 정교한 예측 모델을 향해 나아가는 길로, 논문은 생산하지만 실질적 영향력은 만들어내지 못하는 길이다. 다른 방향은 예측을 개선된 학습 성과로 전환하는 개입을 설계하고, 배치하며, 엄밀하게 평가하는, 더 어렵고 복잡한 작업을 향한 길이다.

여기서 검토한 증거에 따르면, 두 번째 방향은 더 나은 모델뿐만 아니라 더 나은 기관 인프라를 필요로 한다. 즉, 실시간으로 예측에 따라 행동할 수 있는 플랫폼, 적응적 수정을 수용하는 교수 설계, 그리고 이수가 아닌 학습을 측정하는 평가 체계가 요구된다. 이러한 인프라 격차가 해소되기 전까지, 아무리 높은 정확도의 예측 모델이라도 잘못된 문제에 대한 우아한 해법에 머물 것이다.

이 분야를 발전시킬 연구자는 가장 높은 AUC를 달성하는 사람이 아니라, 누가 실패할 것인지 아는 것과 그들이 성공하도록 돕는 것 사이의 고리를 닫는 사람이다.

References (5)

[1] Rizwan, S., Nee, C.K., & Garfan, S. (2025). Identifying the Factors Affecting Student Academic Performance and Engagement Prediction in MOOC Using Deep Learning: A Systematic Literature Review. IEEE Access, 13, 18952–18982.

DOI Scholar

[2] Fazil, M., Rísquez, A., & Halpin, C. (2024). A Novel Deep Learning Model for Student Performance Prediction Using Engagement Data. Journal of Learning Analytics, 11(2).

DOI Scholar

[3] Pecuchová, J. & Drlík, M. (2024). Enhancing the Early Student Dropout Prediction Model Through Clustering Analysis of Students' Digital Traces. IEEE Access, 12, 159336–159367.

DOI Scholar

[4] Rodriguez-Ortiz, M., Santana-Mancilla, P.C., & Anido-Rifón, L. (2025). Machine Learning and Generative AI in Learning Analytics for Higher Education: A Systematic Review of Models, Trends, and Challenges. Applied Sciences, 15(15), 8679.

DOI Scholar

[5] Liu, Y., Xu, C., & Yang, D. (2025). MOOC Dropout Prediction via a Dilated Convolutional Attention Network with Lie Group Features. Informatics, 12(4), 127.

DOI Scholar

Predicting MOOC Dropout with Deep Learning: Solving the Wrong Problem?

The Systematic Evidence: What Five Years of Deep Learning Have Produced

Attention to the Right Things: Novel Architectures

The Digital Traces Approach: Clustering Before Predicting

The Generative AI Disruption

Claims and Evidence

The Uncomfortable Truth: Prediction Is Not Intervention

Open Questions

Implications

딥러닝으로 MOOC 중도 탈락 예측하기: 잘못된 문제를 풀고 있는 것인가?

체계적 증거: 5년간의 딥러닝이 만들어낸 것

올바른 대상에 대한 어텐션: 새로운 아키텍처

디지털 흔적 접근법: 예측 이전에 군집화하기

생성형 AI의 파급 효과

주장과 증거

불편한 진실: 예측은 개입이 아니다

미해결 질문

시사점

References (5)

Explore this topic deeper