Trend AnalysisEducationSystematic Review

Digital Assessment in Higher Education: A Decade of Progress and Persistent Gaps

Digital assessment has evolved from simple online quizzes to AI-adaptive systems that claim to measure deep learning. A decade of systematic reviews reveals persistent gaps between what digital assessment promises and what it delivers—particularly in measuring the higher-order competencies that matter most.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

The decade from 2015 to 2025 has transformed how educational assessment is designed, delivered, and experienced. Online quizzes, automated essay scoring, adaptive testing platforms, e-portfolios, peer assessment systems, and—most recently—AI-powered formative assessment have moved from experimental periphery to institutional mainstream. The COVID-19 pandemic accelerated this transition by years, converting emergency remote assessment into permanent digital infrastructure.

But the question that assessment researchers have been asking with increasing urgency is whether digital assessment measures what educators care about. The technology can efficiently test factual recall, procedural skill, and pattern recognition. Whether it can assess critical thinking, creative problem-solving, ethical reasoning, and the capacity to integrate knowledge across domains—the competencies that higher education claims to develop—remains contested.

The Systematic Landscape

Zainuddin, Wasis, and Ekohariadi (2026) provide a comprehensive systematic literature review examining technology-based assessment in learning, published between 2015 and mid-2025. Using SLR protocol and analyzing 105 empirical and conceptual studies identified through the Scopus database, the review maps the evolution of digital assessment across a decade.

The review identifies several trajectory-level patterns. Early in the decade (2015-2018), digital assessment research focused primarily on feasibility: can assessments be delivered online with acceptable reliability? The evidence from this period was generally positive—online delivery did not systematically reduce assessment quality for most question types.

The middle period (2019-2022), catalyzed by the pandemic, shifted focus to scalability and integrity: how do you assess at scale while preventing cheating? This period produced extensive research on proctoring technologies, randomized question pools, time-limited assessments, and honor codes—with mixed results on effectiveness.

The current period (2023-2025) centers on intelligence and adaptation: can AI make assessment personalized, formative, and capable of measuring complex competencies? This is where the gap between ambition and evidence is widest.

What Digital Assessment Measures Well

Tahir, Saputra, and Othman (2025) contribute a systematic review specifically focused on online assessment in higher education. The shift towards digital learning has accelerated the adoption of online assessment tools, and the review examines what the accumulated evidence says about their effectiveness.

The review confirms that digital assessment works well for certain purposes:

Knowledge testing: Multiple-choice, short-answer, and fill-in-the-blank assessments translate effectively to digital formats with no loss of reliability.
Immediate feedback: Digital platforms can provide instant automated feedback that is impossible in paper-based assessment, supporting formative learning cycles.
Accessibility: Online assessment enables students with disabilities to use assistive technologies, students in remote locations to participate without travel, and institutions to offer flexible timing.
Learning analytics integration: Digital assessment generates data that can be analyzed in real time to identify struggling students, problematic questions, and curriculum gaps.

What Digital Assessment Measures Poorly

The same reviews identify persistent limitations:

Higher-order thinking: Automated assessment of analysis, synthesis, evaluation, and creation remains unreliable. AI scoring systems can evaluate surface features of essays (coherence, vocabulary, structure) but struggle with the quality of argumentation, the depth of analysis, and the originality of insight.
Collaborative competencies: Group projects, peer interaction, and collaborative problem-solving resist standardized digital assessment because the outcomes depend on social processes that individual-level metrics cannot capture.
Practical skills: Laboratory work, clinical skills, fieldwork, and performance-based competencies require physical demonstration that digital environments can simulate but cannot fully replace.
Authenticity: The disconnect between assessment tasks (which are typically structured, time-limited, and individual) and professional practice (which is typically messy, extended, and collaborative) is amplified rather than reduced by digital delivery.

Measuring Digital Literacy Itself

Suri, Festiyed, and Azhar (2025) address a meta-level challenge: how do you assess the digital literacy that students need to succeed in digital assessment environments? Their systematic review and bibliometric analysis examines digital literacy assessment instruments, competency dimensions, and challenges across educational levels.

Digital literacy includes technical proficiency, information evaluation, online collaboration, creativity, and ethical technology use. The review reveals that existing assessment instruments tend to overweight technical skills (can the student use the platform?) and underweight critical skills (can the student evaluate online information? Navigate digital ethics?). This measurement bias is consequential: if institutions measure only what digital assessment instruments can easily test, they will systematically undervalue the competencies that digital citizens most need.

AI-Adaptive Formative Assessment

Sari, Wicaksana, and Rahman (2025) examine the emerging frontier: adaptive AI-driven formative assessment. In the era of digital transformation and AI dominance, the paper argues that cognitive and social-emotional skills have become vital competencies from early stages of education, and that adaptive instructional systems powered by AI offer a pathway to assess and develop these competencies in real time.

The AI-adaptive approach promises to solve a fundamental limitation of traditional assessment: fixed difficulty. In a conventional test, every student answers the same questions, which means the assessment is optimally informative only for students near the test's difficulty level. Adaptive testing adjusts difficulty in real time based on the student's responses, maintaining optimal information gain throughout the assessment.

However, adaptive assessment also introduces new concerns: algorithmic bias in difficulty adjustment (if the system underestimates a student's ability, it may present systematically easier items, creating a ceiling effect), opacity of scoring (students and teachers may not understand why the system assigned a particular score), and the assumption that learning can be meaningfully decomposed into discrete, hierarchically ordered skills that a branching algorithm can navigate.

Claims and Evidence

Claim	Evidence	Verdict
Digital assessment maintains reliability compared to paper-based formats	Tahir et al. (2025): generally supported for structured question types	✅ Supported
Digital assessment can measure higher-order thinking effectively	Zainuddin et al. (2026): persistent gap between ambition and capability	❌ Refuted (with current tools)
AI-adaptive assessment improves formative feedback	Sari et al. (2025): conceptually promising; large-scale evidence limited	⚠️ Uncertain
Digital literacy assessment adequately captures critical digital competencies	Suri et al. (2025): overweights technical skills, underweights evaluative and ethical dimensions	❌ Refuted

Open Questions

Can generative AI transform assessment from product evaluation to process evaluation? If AI can analyze not just the final answer but the reasoning process (drafts, revisions, search patterns, time allocation), could it assess learning more authentically than traditional output-based assessment?

How should institutions balance assessment security with assessment authenticity? Proctored, locked-down assessments are secure but artificial. Open-book, take-home assessments are authentic but vulnerable to AI assistance. Is there a middle path?

What happens when the assessment tool becomes the learning environment? As adaptive assessment platforms increasingly function as learning environments (adjusting content based on performance), the distinction between assessment and instruction blurs. Is this integration beneficial or does it reduce assessment independence?

How do we ensure digital assessment equity across the global digital divide? Students with unreliable internet, older devices, or shared computing access are systematically disadvantaged by digital assessment. What design principles minimize this inequity?

Implications

A decade of digital assessment research converges on a practical conclusion: digital tools are well suited for assessing what is easy to assess (factual knowledge, procedural skill, structured problem-solving) and poorly suited for assessing what is hard to assess (critical thinking, creativity, ethical judgment, collaborative competence). This alignment between technological capability and assessment target is not coincidental—it reflects the fundamental limitation of computational assessment: computers are good at evaluating outputs that can be specified in advance, and poor at evaluating outputs whose value lies in their unpredictability.

The implication is not that digital assessment should be abandoned but that it should be used for what it does well, combined with human assessment for what it does not. The optimal assessment system for most educational contexts is hybrid: digital tools for efficiency, scalability, and data richness; human judgment for complexity, nuance, and the evaluation of genuinely creative work.

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 저작물에서 인용하기 전에 원본 논문을 통해 구체적인 연구 결과, 통계 및 주장을 반드시 확인해야 한다.

고등교육에서의 디지털 평가: 10년간의 발전과 지속되는 격차

2015년부터 2025년까지의 10년은 교육 평가가 설계되고, 시행되며, 경험되는 방식을 변화시켰다. 온라인 퀴즈, 자동 에세이 채점, 적응형 시험 플랫폼, e-포트폴리오, 동료 평가 시스템, 그리고 가장 최근에는 AI 기반 형성 평가가 실험적 주변부에서 제도적 주류로 이동하였다. COVID-19 팬데믹은 긴급 원격 평가를 영구적인 디지털 인프라로 전환시키면서 이러한 전환을 수년 앞당겼다.

그러나 평가 연구자들이 점점 더 강하게 제기해 온 질문은 디지털 평가가 교육자들이 중요하게 여기는 것을 측정하는가이다. 기술은 사실적 암기, 절차적 기술, 그리고 패턴 인식을 효율적으로 검사할 수 있다. 반면 고등교육이 개발하고자 주장하는 역량인 비판적 사고, 창의적 문제 해결, 윤리적 추론, 그리고 여러 영역에 걸쳐 지식을 통합하는 능력을 평가할 수 있는가에 대해서는 여전히 논쟁이 계속되고 있다.

체계적 전경

Zainuddin, Wasis, Ekohariadi(2026)는 2015년부터 2025년 중반까지 발표된 학습에서의 기술 기반 평가를 검토하는 포괄적인 체계적 문헌 고찰을 제공한다. SLR 프로토콜을 사용하고 Scopus 데이터베이스를 통해 확인된 105편의 실증적·개념적 연구를 분석한 이 고찰은 10년에 걸친 디지털 평가의 발전을 체계적으로 정리한다.

이 고찰은 몇 가지 궤적 수준의 패턴을 확인한다. 10년의 초반(2015-2018)에 디지털 평가 연구는 주로 실현 가능성에 초점을 맞추었다: 허용 가능한 신뢰도를 갖추면서 온라인으로 평가를 시행할 수 있는가? 이 시기의 증거는 대체로 긍정적이었다. 온라인 시행이 대부분의 문항 유형에서 평가 품질을 체계적으로 저하시키지 않는다는 것이었다.

팬데믹으로 촉발된 중간 시기(2019-2022)는 확장성과 무결성으로 초점이 전환되었다: 부정행위를 방지하면서 대규모로 평가를 수행하는 방법은 무엇인가? 이 시기에는 프록터링 기술, 무작위 문항 풀, 시간 제한 평가, 그리고 명예 서약에 관한 광범위한 연구가 이루어졌으나, 효과성에 대해서는 엇갈린 결과가 나왔다.

현재 시기(2023-2025)는 지능과 적응을 중심으로 한다: AI가 평가를 개인화하고, 형성적으로 만들며, 복잡한 역량을 측정할 수 있도록 할 수 있는가? 이것이 바로 포부와 증거 사이의 격차가 가장 큰 영역이다.

디지털 평가가 잘 측정하는 것

Tahir, Saputra, Othman(2025)은 고등교육에서의 온라인 평가에 초점을 맞춘 체계적 고찰을 제공한다. 디지털 학습으로의 전환은 온라인 평가 도구의 도입을 가속화하였으며, 이 고찰은 축적된 증거가 그 효과성에 대해 무엇을 말하는지를 검토한다.

이 고찰은 디지털 평가가 특정 목적에 대해 효과적임을 확인한다:

지식 평가: 선다형, 단답형, 빈칸 채우기 평가는 신뢰도의 손실 없이 디지털 형식으로 효과적으로 전환된다.
즉각적 피드백: 디지털 플랫폼은 지필 평가에서는 불가능한 즉각적인 자동 피드백을 제공하여 형성적 학습 사이클을 지원할 수 있다.
접근성: 온라인 평가는 장애가 있는 학생들이 보조 기술을 사용할 수 있도록 하고, 원격 지역의 학생들이 이동 없이 참여할 수 있도록 하며, 기관이 유연한 시간 일정을 제공할 수 있도록 한다.
학습 분석 통합: 디지털 평가는 어려움을 겪고 있는 학생, 문제 있는 문항, 교육과정의 격차를 파악하기 위해 실시간으로 분석될 수 있는 데이터를 생성한다.

디지털 평가가 잘 측정하지 못하는 것

동일한 고찰들은 지속적인 한계를 확인한다:

고차원적 사고(Higher-order thinking): 분석, 종합, 평가, 창조에 대한 자동화된 평가는 여전히 신뢰하기 어렵다. AI 채점 시스템은 에세이의 표면적 특징(일관성, 어휘, 구조)은 평가할 수 있지만, 논증의 질, 분석의 깊이, 통찰의 독창성에 대해서는 어려움을 겪는다.
협업 역량(Collaborative competencies): 그룹 프로젝트, 동료 상호작용, 협력적 문제 해결은 표준화된 디지털 평가에 적합하지 않은데, 그 결과가 개인 수준의 지표로는 포착할 수 없는 사회적 과정에 의존하기 때문이다.
실기 능력(Practical skills): 실험실 작업, 임상 기술, 현장 실습, 수행 기반 역량은 물리적 시연을 요구하며, 디지털 환경이 이를 모사할 수는 있어도 완전히 대체하지는 못한다.
진정성(Authenticity): 평가 과제(일반적으로 구조화되어 있고, 시간이 제한되며, 개인 단위로 이루어지는)와 실제 전문 실천(일반적으로 복잡하고, 장기간에 걸치며, 협력적인)사이의 괴리는 디지털 방식으로 전환된다고 해서 줄어들기보다 오히려 심화된다.

디지털 리터러시 자체의 측정

Suri, Festiyed, Azhar(2025)는 메타 수준의 과제를 다룬다. 바로 학생이 디지털 평가 환경에서 성공하는 데 필요한 디지털 리터러시를 어떻게 평가할 것인가의 문제이다. 이들의 체계적 문헌 고찰 및 계량서지학적 분석은 교육 수준 전반에 걸쳐 디지털 리터러시 평가 도구, 역량 차원, 그리고 관련 과제들을 검토한다.

디지털 리터러시에는 기술적 숙련도, 정보 평가, 온라인 협업, 창의성, 윤리적 기술 활용이 포함된다. 이 문헌 고찰은 기존 평가 도구들이 기술적 능력(학생이 플랫폼을 사용할 수 있는가?)을 과도하게 강조하고, 비판적 능력(학생이 온라인 정보를 평가할 수 있는가? 디지털 윤리를 탐색할 수 있는가?)을 과소평가하는 경향이 있음을 밝힌다. 이러한 측정 편향은 중요한 결과를 낳는다. 기관이 디지털 평가 도구로 쉽게 측정할 수 있는 것만 측정한다면, 디지털 시민에게 가장 필요한 역량을 체계적으로 과소평가하게 될 것이다.

AI 적응형 형성평가

Sari, Wicaksana, Rahman(2025)은 새롭게 부상하는 영역인 AI 기반 적응형 형성평가를 검토한다. 이 논문은 디지털 전환과 AI 지배의 시대에 인지적·사회정서적 기술이 교육의 초기 단계부터 핵심 역량으로 부각되고 있으며, AI 기반 적응형 교수 시스템이 이러한 역량을 실시간으로 평가하고 발전시키는 경로를 제공한다고 주장한다.

AI 적응형 접근법은 전통적 평가의 근본적 한계인 고정된 난이도 문제를 해결할 것을 약속한다. 기존 시험에서는 모든 학생이 동일한 문제에 답하므로, 평가는 시험의 난이도 수준에 근접한 학생들에게만 최적의 정보를 제공한다. 적응형 평가는 학생의 응답에 따라 실시간으로 난이도를 조정함으로써, 평가 전반에 걸쳐 최적의 정보 획득을 유지한다.

그러나 적응형 평가는 새로운 우려 사항도 제기한다. 난이도 조정에서의 알고리즘 편향(시스템이 학생의 능력을 과소평가할 경우, 체계적으로 더 쉬운 문항을 제시하여 천장 효과를 유발할 수 있음), 채점의 불투명성(학생과 교사가 시스템이 특정 점수를 부여한 이유를 이해하지 못할 수 있음), 그리고 학습이 분기 알고리즘이 탐색할 수 있는 이산적이고 위계적으로 정렬된 기술들로 의미 있게 분해될 수 있다는 가정이 그것이다.

주장과 근거

주장	근거	판정
디지털 평가는 지필 평가 방식과 비교하여 신뢰도를 유지한다	Tahir et al.(2025): 구조화된 문항 유형에 대해 대체로 지지됨	✅ 지지됨
디지털 평가는 고차원적 사고를 효과적으로 측정할 수 있다	Zainuddin et al.(2026): 목표와 실제 역량 사이의 격차가 지속됨	❌ 반박됨(현재 도구 기준)
AI 적응형 평가는 형성적 피드백을 개선한다	Sari et al.(2025): 개념적으로 유망하나 대규모 근거는 제한적임	⚠️ 불확실
디지털 리터러시 평가가 핵심 디지털 역량을 충분히 포착한다	Suri et al. (2025): 기술적 기술을 과대평가하고, 평가적·윤리적 차원을 과소평가함	❌ 반박됨

미해결 질문

생성형 AI는 평가를 산출물 평가에서 과정 평가로 전환할 수 있는가? AI가 최종 답변뿐만 아니라 추론 과정(초안, 수정, 검색 패턴, 시간 배분)을 분석할 수 있다면, 전통적인 산출물 기반 평가보다 더 진정성 있게 학습을 평가할 수 있을까?

기관은 평가 보안과 평가 진정성을 어떻게 균형 있게 유지해야 하는가? 감독관이 있는 폐쇄형 평가는 보안성이 높지만 인위적이다. 개방형 교재 허용, 가정 제출 평가는 진정성이 있지만 AI 지원에 취약하다. 중간 경로가 존재하는가?

평가 도구가 학습 환경 자체가 될 때 무슨 일이 일어나는가? 적응형 평가 플랫폼이 점점 더 학습 환경으로 기능하면서(성과에 따라 콘텐츠를 조정), 평가와 교수 사이의 구분이 모호해진다. 이러한 통합은 유익한가, 아니면 평가의 독립성을 약화시키는가?

전 세계 디지털 격차 전반에 걸쳐 디지털 평가의 형평성을 어떻게 보장할 수 있는가? 불안정한 인터넷, 구형 기기, 또는 공유 컴퓨팅 환경을 가진 학생들은 디지털 평가에서 체계적으로 불리한 위치에 놓인다. 이러한 불평등을 최소화하는 설계 원칙은 무엇인가?

시사점

10년간의 디지털 평가 연구는 하나의 실용적인 결론으로 수렴된다: 디지털 도구는 평가하기 쉬운 것(사실적 지식, 절차적 기술, 구조화된 문제 해결)을 평가하는 데 적합하며, 평가하기 어려운 것(비판적 사고, 창의성, 윤리적 판단, 협력적 역량)을 평가하는 데는 적합하지 않다. 기술적 역량과 평가 대상 사이의 이러한 정렬은 우연이 아니며, 이는 컴퓨터 기반 평가의 근본적인 한계를 반영한다: 컴퓨터는 사전에 명시될 수 있는 산출물을 평가하는 데 능숙하고, 그 가치가 예측 불가능성에 있는 산출물을 평가하는 데는 서투르다.

이것이 디지털 평가를 포기해야 한다는 의미는 아니며, 오히려 디지털 평가가 잘 수행하는 것에 활용하고, 그렇지 못한 것에는 인간 평가와 결합해야 한다는 것을 시사한다. 대부분의 교육 맥락에서 최적의 평가 시스템은 혼합형이다: 효율성, 확장성, 데이터 풍부성을 위한 디지털 도구와, 복잡성, 미묘함, 그리고 진정으로 창의적인 작업의 평가를 위한 인간의 판단을 결합하는 것이다.

References (4)

[1] Zainuddin, A., Wasis, & Ekohariadi. (2026). A Systematic Review of Digital Assessment Trends in Education from 2015-2025. Multidisciplinary Reviews, 2026, 433.

DOI Scholar

[2] Tahir, M., Saputra, S., & Othman, S. (2025). Online Assessment in Higher Education: A Systematic Literature Review. Multidisciplinary Reviews, 2026, 024.

DOI Scholar

[3] Suri, N.A., Festiyed, & Azhar, M. (2025). Measuring What Matters: A Systematic Review and VOSviewer-Based Bibliometric Approach to Digital Literacy Assessment. Research in Learning Technology, 33, 3413.

DOI Scholar

[4] Sari, N.K., Wicaksana, M.F., & Rahman, M.A. (2025). Adaptive AI-Driven Formative Assessment in Early Childhood Education: A Systematic Review and Meta-Analysis on Cultivating Social-Emotional Learning and Early Moderatio. Child Education Journal, 7(3), 8412.

DOI Scholar

Digital Assessment in Higher Education: A Decade of Progress and Persistent Gaps

The Systematic Landscape

What Digital Assessment Measures Well

What Digital Assessment Measures Poorly

Measuring Digital Literacy Itself

AI-Adaptive Formative Assessment

Claims and Evidence

Open Questions

Implications

고등교육에서의 디지털 평가: 10년간의 발전과 지속되는 격차

체계적 전경

디지털 평가가 잘 측정하는 것

디지털 평가가 잘 측정하지 못하는 것

디지털 리터러시 자체의 측정

AI 적응형 형성평가

주장과 근거

미해결 질문

시사점

References (4)

Explore this topic deeper