EducationMixed Methods
GenAI and Assessment: The End of the Essay Exam or Its Renaissance?
Generative AI has rendered traditional assessment obsolete overnightโbut the evidence suggests AI-generated work is already indistinguishable from student work in most rubrics. The real question is not how to detect AI use but how to redesign assessment for a world where AI is ubiquitous.
By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.
In the spring of 2023, universities worldwide scrambled to ban ChatGPT. By fall 2024, many of those same institutions had reversed course, integrating generative AI into their curricula. By 2025, the question has shifted entirely: from "How do we prevent students from using AI?" to "How do we assess learning in a world where AI is an ambient capability?"
This shift is not a capitulation. It reflects a dawning recognition that the assessment practices most threatened by generative AIโtake-home essays, literature reviews, code-from-scratch assignments, and unsupervised online examsโwere already pedagogically suspect before AI arrived. They assessed product rather than process, rewarded recall rather than reasoning, and measured output rather than understanding. Generative AI did not break assessment. It exposed fractures that were already there.
The Empirical Landscape
Kofinas, Tsay, and Pike (2025), provide a rigorous empirical assessment of AI's impact on authentic assessments in higher education. Working across two UK-based universities, they submitted AI-generated responses to the same assessments that students completed, without informing the marking teams. The results are striking:
- AI-generated work received grades comparable to student submissions across multiple assessment types.
- Markers, in general, were not able to distinguish assessments that had GenAI input from assessments that did notโeven on tasks designed as "authentic" assessments resistant to automation.
The implication is worth noting: assessments that cannot distinguish AI output from student output are not assessing what they claim to assess. If an AI can produce a "good" case study analysis without understanding the case, then the assessment measures writing quality and surface-level analytical structure, not genuine business understanding.
Usher (2025) extends this analysis by comparing three feedback sourcesโinstructor, peer, and AIโin a study of 76 undergraduate students. This study has become a reference point for assessment redesign. The findings indicate that:
- AI chatbots consistently assigned higher grades than human assessors, suggesting a leniency bias.
- AI chatbot feedback generally provided higher-quality feedback compared to peers, offering detailed insights and specific guidance for improvement, though it occasionally included irrelevant or contradictory information.
- However, peer feedback was more personalized and context-sensitive than chatbot feedback.
- The findings highlight the importance of human judgment, suggesting that integrating chatbot-based assessments with traditional methods can leverage their complementary strengths.
Framework for AI-Era Assessment
Ilieva, Yankova, and Ruseva (2025) propose a comprehensive redesign framework. Their three-branch, multi-level model is structured around the responsibilities of three key stakeholders:
Branch 1: Instructors. Teaching staff design adaptive, AI-informed assessment tasks and provide feedback that accounts for AI capabilities. This means crafting assessments where AI tools can be used transparently and where the assessment criteria evaluate higher-order thinking rather than output generation.
Branch 2: Students. Learners engage with AI tools transparently, with clear guidelines on acceptable use and expectations for demonstrating their own understanding alongside AI-assisted work.
Branch 3: Control Authorities. Institutional bodies ensure accountability through compliance standards, policies, and auditsโcreating the governance infrastructure that makes AI-integrated assessment trustworthy.
The framework's strength is its holistic approach: rather than treating AI in assessment as purely an instructor problem or a policing problem, it distributes responsibility across the entire educational ecosystem. This suggests that the most promising assessment approaches in the AI era may combine elements of resistance (tasks AI cannot complete, such as oral examinations and live problem-solving), integration (transparent AI use with metacognitive reflection), and transformation (assessment that evaluates students' ability to critically evaluate AI output rather than produce original content).
The SOUR Exam Crisis
Newton and Draper (2025) document a concerning development: the extensive use of Summative Online Unsupervised Remote (SOUR) examinations in UK higher education. Using Freedom of Information requests across UK universities, they find that SOUR exams remain widely used as a significant assessment component, despite mounting evidence of high levels of cheatingโand that generative AI has made detection increasingly difficult.
The paper identifies a governance failure: university quality assurance committees approved SOUR exams during the COVID emergency as temporary measures, but institutional inertia, cost savings, and student preference have made them permanent. Quality assurance frameworks that were designed to evaluate in-person assessment have not adapted to evaluate the integrity of unsupervised remote assessment, creating what Newton and Draper call an "integrity vacuum."
Claims and Evidence
<
| Claim | Evidence | Verdict |
|---|
| AI-generated work is indistinguishable from student work in standard assessments | Kofinas et al. (2025): markers generally unable to distinguish AI-input from non-AI assessments | โ
Supported |
| AI feedback is comparable to instructor feedback in accuracy | Usher (2025): AI chatbots assign higher grades than human assessors; consistency differs from accuracy | โ ๏ธ Uncertain |
| AI-proof assessments are scalable | Ilieva et al. (2025): viable but require examiner time proportional to cohort size | โ Refuted |
| AI-transparent assessment develops higher-order skills | Theoretical argument supported by pilot studies; no large-scale RCT | โ ๏ธ Uncertain |
| SOUR exams maintain academic integrity | Newton & Draper (2025): high cheating levels, AI makes detection increasingly difficult | โ Refuted |
Open Questions
Is AI detection a dead end? Current AI detection tools achieve 70โ85% accuracy with high false positive rates. As language models improve, the accuracy gap will widen. Should universities abandon detection entirely and focus exclusively on assessment redesign?What assessment skills become more valuable in the AI era? If AI can produce competent first drafts, the premium shifts to evaluation, synthesis, judgment, and the ability to identify what is missingโprecisely the skills that Bloom's taxonomy places at its apex.How do we assess process when only product is visible? AI-transparent assessment requires insight into how students interact with AI, but current LLMs do not provide auditable interaction logs in a standard format. Should assessment platforms mandate interaction logging?What happens to students who lack AI access? If assessment assumes AI use, students without reliable internet, current devices, or paid API access are disadvantaged. AI-integrated assessment may create a new digital divide.Can we measure learning that AI cannot replicate? Embodied knowledge, ethical judgment, relational skills, creative visionโthese may be the assessment frontier. But they are also the hardest to assess reliably.Implications
The evidence points to an unavoidable conclusion: the traditional assessment toolkit of higher educationโessays, exams, reportsโis no longer fit for purpose in a world where AI can produce competent versions of all of them. This is not a temporary disruption; it is a permanent shift in the epistemological foundations of assessment.
The institutions that will thrive are those that treat this moment not as a threat but as an invitation to redesign assessment around what humans do that AI cannot: exercise judgment under genuine uncertainty, integrate knowledge across disciplinary boundaries, create rather than reproduce, and take ethical responsibility for the consequences of their decisions.
A particularly promising assessment approach in the AI era may be among the simplest: sit across from a student, give them a novel problem, and ask them to think out loud. No AI can fake the live, embodied demonstration of understanding that occurs in real-time intellectual dialogue. The irony is that the oldest form of assessmentโthe Socratic oral examinationโmay prove to be the most AI-resistant.
๋ฉด์ฑ
์กฐํญ: ์ด ๊ฒ์๋ฌผ์ ์ ๋ณด ์ ๊ณต์ ๋ชฉ์ ์ผ๋ก ํ ์ฐ๊ตฌ ๋ํฅ ๊ฐ์์ด๋ค. ํ์ ์ ์๋ฌผ์์ ์ธ์ฉํ๊ธฐ ์ ์ ํน์ ์ฐ๊ตฌ ๊ฒฐ๊ณผ, ํต๊ณ ๋ฐ ์ฃผ์ฅ์ ์๋ณธ ๋
ผ๋ฌธ์ ํตํด ๋ฐ๋์ ๊ฒ์ฆํด์ผ ํ๋ค.
GenAI์ ํ๊ฐ: ์์ธ์ด ์ํ์ ์ข
๋ง์ธ๊ฐ, ๋ฅด๋ค์์ค์ธ๊ฐ?
2023๋
๋ด, ์ ์ธ๊ณ ๋ํ๋ค์ ์๋คํฌ์ด ChatGPT๋ฅผ ๊ธ์งํ๋ค. 2024๋
๊ฐ์, ๋ฐ๋ก ๊ทธ ๋ํ๋ค ์ค ์๋น์๊ฐ ๋ฐฉ์นจ์ ๋ฐ๊พธ์ด ์์ฑํ AI๋ฅผ ๊ต์ก๊ณผ์ ์ ํตํฉํ๊ธฐ ์์ํ๋ค. 2025๋
์ ์ด๋ฅด๋ฌ ํต์ฌ ์ง๋ฌธ์ ์์ ํ ๋ฌ๋ผ์ก๋ค. "ํ์๋ค์ด AI๋ฅผ ์ฌ์ฉํ์ง ๋ชปํ๋๋ก ์ด๋ป๊ฒ ๋ง์ ๊ฒ์ธ๊ฐ?"์์ "AI๊ฐ ์ฃผ๋ณ ์ด๋์๋ ํ์ฉ ๊ฐ๋ฅํ ์ธ๊ณ์์ ์ด๋ป๊ฒ ํ์ต์ ํ๊ฐํ ๊ฒ์ธ๊ฐ?"๋ก ์ ํ๋ ๊ฒ์ด๋ค.
์ด๋ฌํ ์ ํ์ ๊ตด๋ณต์ด ์๋๋ค. ์ด๋ ์์ฑํ AI๋ก ์ธํด ๊ฐ์ฅ ํฐ ์ํ์ ๋ฐ๋ ํ๊ฐ ๋ฐฉ์๋คโ๊ณผ์ ํ ์์ธ์ด, ๋ฌธํ ๊ฒํ , ์ฒ์๋ถํฐ ์ฝ๋๋ฅผ ์์ฑํ๋ ๊ณผ์ , ๊ทธ๋ฆฌ๊ณ ๋น๊ฐ๋
์จ๋ผ์ธ ์ํโ์ด AI๊ฐ ๋ฑ์ฅํ๊ธฐ ์ด์ ๋ถํฐ ์ด๋ฏธ ๊ต์กํ์ ์ผ๋ก ๋ฌธ์ ๊ฐ ์์๋ค๋ ์ธ์์ด ์์ํ ํ์ฐ๋ ๊ฒฐ๊ณผ์ด๋ค. ์ด๋ฌํ ํ๊ฐ๋ค์ ๊ณผ์ ์ด ์๋ ๊ฒฐ๊ณผ๋ฌผ์ ํ๊ฐํ๊ณ , ์ถ๋ก ์ด ์๋ ์๊ธฐ์ ๋ณด์์ ์ฃผ๋ฉฐ, ์ดํด๊ฐ ์๋ ์ฐ์ถ์ ์ธก์ ํ๋ค. ์์ฑํ AI๊ฐ ํ๊ฐ ์ฒด๊ณ๋ฅผ ๋ง๊ฐ๋จ๋ฆฐ ๊ฒ์ด ์๋๋ค. AI๋ ์ด๋ฏธ ์กด์ฌํ๋ ๊ท ์ด์ ๋๋ฌ๋์ ๋ฟ์ด๋ค.
์ค์ฆ์ ์ฐ๊ตฌ ํํฉ
Kofinas, Tsay, Pike(2025)๋ ๊ณ ๋ฑ๊ต์ก์ ์ค์ ์ ํ๊ฐ(authentic assessment)์ ๋ํ AI์ ์ํฅ์ ์๋ฐํ๊ฒ ์ค์ฆ์ ์ผ๋ก ๋ถ์ํ๋ค. ์ด๋ค์ ์๊ตญ ์์ฌ ๋ ๊ฐ ๋ํ์์, ์ฑ์ ํ์๊ฒ ์๋ฆฌ์ง ์์ ์ฑ ํ์๋ค์ด ์๋ฃํ ๋์ผํ ํ๊ฐ ๊ณผ์ ์ AI๊ฐ ์์ฑํ ๋ต๋ณ์ ์ ์ถํ๋ ๋ฐฉ์์ผ๋ก ์ฐ๊ตฌ๋ฅผ ์ํํ๋ค. ๊ทธ ๊ฒฐ๊ณผ๋ ์ฃผ๋ชฉํ ๋งํ๋ค.
- AI๊ฐ ์์ฑํ ๊ฒฐ๊ณผ๋ฌผ์ ์ฌ๋ฌ ํ๊ฐ ์ ํ์ ๊ฑธ์ณ ํ์ ์ ์ถ๋ฌผ๊ณผ ๋น์ทํ ์์ค์ ์ ์๋ฅผ ๋ฐ์๋ค.
- ์ฑ์ ์๋ค์ ์ผ๋ฐ์ ์ผ๋ก GenAI๊ฐ ๊ฐ์
๋ ํ๊ฐ๋ฌผ๊ณผ ๊ทธ๋ ์ง ์์ ํ๊ฐ๋ฌผ์ ๊ตฌ๋ณํ์ง ๋ชปํ๋คโ์ฌ์ง์ด ์๋ํ์ ์ ํญํ๋๋ก ์ค๊ณ๋ '์ค์ ์ ' ํ๊ฐ ๊ณผ์ ์์๋ ๋ง์ฐฌ๊ฐ์ง์๋ค.
์ด ์์ฌ์ ์ ์ฃผ๋ชฉํ ๊ฐ์น๊ฐ ์๋ค. AI ์ฐ์ถ๋ฌผ๊ณผ ํ์ ์ฐ์ถ๋ฌผ์ ๊ตฌ๋ณํ์ง ๋ชปํ๋ ํ๊ฐ๋ ์์ ์ด ํ๊ฐํ๋ค๊ณ ์ฃผ์ฅํ๋ ๊ฒ์ ์ค์ ๋ก๋ ํ๊ฐํ์ง ๋ชปํ๊ณ ์๋ ๊ฒ์ด๋ค. ๋ง์ฝ AI๊ฐ ์ฌ๋ก๋ฅผ ์ดํดํ์ง ์๊ณ ๋ '์ฐ์ํ' ์ฌ๋ก ๋ถ์์ ์์ฑํ ์ ์๋ค๋ฉด, ํด๋น ํ๊ฐ๋ ์ง์ ํ ๋น์ฆ๋์ค ์ดํด๋ ฅ์ด ์๋๋ผ ์๋ฌธ ํ์ง๊ณผ ํ๋ฉด์ ๋ถ์ ๊ตฌ์กฐ๋ฅผ ์ธก์ ํ๋ ๊ฒ์ ๋ถ๊ณผํ๋ค.
Usher(2025)๋ 76๋ช
์ ํ๋ถ์์ ๋์์ผ๋ก ํ ์ฐ๊ตฌ์์ ๊ต์์, ๋๋ฃ, AI๋ผ๋ ์ธ ๊ฐ์ง ํผ๋๋ฐฑ ์ถ์ฒ๋ฅผ ๋น๊ตํจ์ผ๋ก์จ ์ด ๋ถ์์ ํ์ฅํ๋ค. ์ด ์ฐ๊ตฌ๋ ํ๊ฐ ์ฌ์ค๊ณ์ ์ฐธ๊ณ ๊ธฐ์ค์ ์ด ๋์๋ค. ์ฐ๊ตฌ ๊ฒฐ๊ณผ๋ ๋ค์์ ๋ณด์ฌ์ค๋ค.
- AI ์ฑ๋ด์ ์ธ๊ฐ ํ๊ฐ์๋ณด๋ค ์ผ๊ด๋๊ฒ ๋ ๋์ ์ ์๋ฅผ ๋ถ์ฌํ์ฌ, ๊ด๋์ฑ ํธํฅ์ด ์์์ ์์ฌํ๋ค.
- AI ์ฑ๋ด ํผ๋๋ฐฑ์ ์์ธํ ํต์ฐฐ๊ณผ ๊ฐ์ ์ ์ํ ๊ตฌ์ฒด์ ์ธ ์ง์นจ์ ์ ๊ณตํ๋ ์ธก๋ฉด์์ ์ผ๋ฐ์ ์ผ๋ก ๋๋ฃ ํผ๋๋ฐฑ๋ณด๋ค ๋ ๋์ ํ์ง์ ํผ๋๋ฐฑ์ ์ ๊ณตํ์ผ๋, ๊ฐํน ๊ด๋ จ์ฑ์ด ์๊ฑฐ๋ ์์ถฉ๋๋ ์ ๋ณด๋ฅผ ํฌํจํ๊ธฐ๋ ํ๋ค.
- ๊ทธ๋ฌ๋ ๋๋ฃ ํผ๋๋ฐฑ์ ์ฑ๋ด ํผ๋๋ฐฑ๋ณด๋ค ๋ ๊ฐ์ธํ๋์ด ์๊ณ ๋งฅ๋ฝ์ ๋ฏผ๊ฐํ๋ค.
- ์ด ์ฐ๊ตฌ ๊ฒฐ๊ณผ๋ ์ธ๊ฐ์ ํ๋จ์ ์ค์์ฑ์ ๊ฐ์กฐํ๋ฉฐ, ์ฑ๋ด ๊ธฐ๋ฐ ํ๊ฐ๋ฅผ ์ ํต์ ๋ฐฉ์๊ณผ ํตํฉํ๋ฉด ๊ฐ๊ฐ์ ์ํธ ๋ณด์์ ๊ฐ์ ์ ํ์ฉํ ์ ์์์ ์ ์ํ๋ค.
AI ์๋์ ํ๊ฐ ํ๋ ์์ํฌ
Ilieva, Yankova, Ruseva(2025)๋ ํฌ๊ด์ ์ธ ํ๊ฐ ์ฌ์ค๊ณ ํ๋ ์์ํฌ๋ฅผ ์ ์ํ๋ค. ์ด๋ค์ 3๊ฐ ๋ถ๋ฌธ, ๋ค๋จ๊ณ ๋ชจ๋ธ์ ์ธ ํต์ฌ ์ดํด๊ด๊ณ์์ ์ฑ
์์ ์ค์ฌ์ผ๋ก ๊ตฌ์กฐํ๋๋ค.
๋ถ๋ฌธ 1: ๊ต์์. ๊ต์ก ๋ด๋น์๋ ์ ์์ ์ด๊ณ AI๋ฅผ ๋ฐ์ํ ํ๊ฐ ๊ณผ์ ๋ฅผ ์ค๊ณํ๊ณ , AI ์ญ๋์ ๊ณ ๋ คํ ํผ๋๋ฐฑ์ ์ ๊ณตํ๋ค. ์ด๋ AI ๋๊ตฌ๋ฅผ ํฌ๋ช
ํ๊ฒ ํ์ฉํ ์ ์๋ ํ๊ฐ๋ฅผ ์ค๊ณํ๊ณ , ํ๊ฐ ๊ธฐ์ค์ด ๊ฒฐ๊ณผ๋ฌผ ์์ฑ์ด ์๋ ๊ณ ์ฐจ์์ ์ฌ๊ณ ๋ฅผ ํ๊ฐํ๋๋ก ํ๋ ๊ฒ์ ์๋ฏธํ๋ค.
๋ถ๋ฌธ 2: ํ์. ํ์ต์๋ AI ๋๊ตฌ๋ฅผ ํฌ๋ช
ํ๊ฒ ํ์ฉํ๋ฉฐ, ํ์ฉ ๊ฐ๋ฅํ ์ฌ์ฉ ๋ฒ์์ ๋ํ ๋ช
ํํ ์ง์นจ๊ณผ AI ๋ณด์กฐ ์์
๊ณผ ๋ณํํ์ฌ ์์ ์ ์ดํด๋ฅผ ์
์ฆํ๋ ๊ฒ์ ๋ํ ๊ธฐ๋๋ฅผ ์ค์ํ๋ค.
๋ถ๋ฌธ 3: ๊ด๋ฆฌ ๊ธฐ๊ด. ๊ธฐ๊ด ์ฐจ์์ ์ฃผ์ฒด๋ค์ ๊ท์ ์ค์ ๊ธฐ์ค, ์ ์ฑ
, ๊ฐ์ฌ๋ฅผ ํตํด ์ฑ
์์ฑ์ ํ๋ณดํ๋ฉฐ, AI ํตํฉ ํ๊ฐ๋ฅผ ์ ๋ขฐํ ์ ์๋๋ก ํ๋ ๊ฑฐ๋ฒ๋์ค ์ธํ๋ผ๋ฅผ ๊ตฌ์ถํ๋ค.
์ด ํ๋ ์์ํฌ์ ๊ฐ์ ์ ์ด์ฒด์ ์ ๊ทผ ๋ฐฉ์์ ์๋ค. ์ฆ, ํ๊ฐ์์์ AI๋ฅผ ์์ ํ ๊ต์์์ ๋ฌธ์ ๋ ๋จ์์ ๋ฌธ์ ๋ก ๋ค๋ฃจ๋ ๋์ , ์ ์ฒด ๊ต์ก ์ํ๊ณ์ ๊ฑธ์ณ ์ฑ
์์ ๋ถ์ฐ์ํจ๋ค. ์ด๋ AI ์๋์ ๊ฐ์ฅ ์ ๋งํ ํ๊ฐ ์ ๊ทผ ๋ฐฉ์์ด ์ ํญ(๊ตฌ์ ์ํ ๋ฐ ์ค์๊ฐ ๋ฌธ์ ํด๊ฒฐ๊ณผ ๊ฐ์ด AI๊ฐ ์์ํ ์ ์๋ ๊ณผ์ ), ํตํฉ(๋ฉํ์ธ์ง์ ์ฑ์ฐฐ์ ๋๋ฐํ ํฌ๋ช
ํ AI ํ์ฉ), ๋ณํ(๋
์ฐฝ์ ์ธ ์ฝํ
์ธ ๋ฅผ ์์ฑํ๋ ๋ฅ๋ ฅ๋ณด๋ค AI ์ฐ์ถ๋ฌผ์ ๋นํ์ ์ผ๋ก ํ๊ฐํ๋ ํ์์ ๋ฅ๋ ฅ์ ์ธก์ ํ๋ ํ๊ฐ)์ ์์๋ฅผ ๊ฒฐํฉํ ์ ์์์ ์์ฌํ๋ค.
SOUR ์ํ์ ์๊ธฐ
Newton๊ณผ Draper(2025)๋ ์ฐ๋ ค์ค๋ฌ์ด ํ์์ ๊ธฐ๋กํ๋ค. ๋ฐ๋ก ์๊ตญ ๊ณ ๋ฑ๊ต์ก์์ SOUR(Summative Online Unsupervised Remote, ์ด๊ด์ ์จ๋ผ์ธ ๋น๊ฐ๋
์๊ฒฉ) ์ํ์ด ๊ด๋ฒ์ํ๊ฒ ํ์ฉ๋๊ณ ์๋ค๋ ์ ์ด๋ค. ์๊ตญ ๋ํ๋ค์ ๋์์ผ๋ก ํ ์ ๋ณด๊ณต๊ฐ ์ฒญ๊ตฌ๋ฅผ ํตํด, ์ด๋ค์ SOUR ์ํ์ด ์๋นํ ์์ค์ ๋ถ์ ํ์๊ฐ ๋ง์ฐํ๋ค๋ ์ฆ๊ฑฐ๊ฐ ์ถ์ ๋๊ณ ์์์๋ ๋ถ๊ตฌํ๊ณ ์ฌ์ ํ ์ค์ํ ํ๊ฐ ์์๋ก ๋๋ฆฌ ์ฌ์ฉ๋๊ณ ์์ผ๋ฉฐ, ์์ฑํ AI๋ก ์ธํด ํ์ง๊ฐ ์ ์ ๋ ์ด๋ ค์์ง๊ณ ์์์ ๋ฐํ๋ธ๋ค.
์ด ๋
ผ๋ฌธ์ ๊ฑฐ๋ฒ๋์ค ์คํจ๋ฅผ ๊ท๋ช
ํ๋ค. ๋ํ ๊ต์ก ์ง ๊ด๋ฆฌ ์์ํ๋ COVID ๋น์ ์ํฉ์์ SOUR ์ํ์ ์์ ์กฐ์น๋ก ์น์ธํ์ผ๋, ์ ๋์ ๊ด์ฑ, ๋น์ฉ ์ ๊ฐ, ํ์ ์ ํธ๋๋ก ์ธํด ์ด๊ฒ์ด ์๊ตฌ์ ์ผ๋ก ์ ์ฐฉ๋์๋ค. ๋๋ฉด ํ๊ฐ๋ฅผ ํ๊ฐํ๋๋ก ์ค๊ณ๋ ๊ต์ก ์ง ๊ด๋ฆฌ ํ๋ ์์ํฌ๋ ๋น๊ฐ๋
์๊ฒฉ ํ๊ฐ์ ๋ฌด๊ฒฐ์ฑ์ ํ๊ฐํ๋ ๋ฐฉํฅ์ผ๋ก ์ ์ํ์ง ๋ชปํ์ผ๋ฉฐ, ์ด๋ Newton๊ณผ Draper๊ฐ "๋ฌด๊ฒฐ์ฑ ์ง๊ณต(integrity vacuum)"์ด๋ผ๊ณ ๋ถ๋ฅด๋ ์ํ๋ฅผ ์ด๋ํ๋ค.
์ฃผ์ฅ๊ณผ ๊ทผ๊ฑฐ
<
| ์ฃผ์ฅ | ๊ทผ๊ฑฐ | ํ์ |
|---|
| AI ์์ฑ ๊ฒฐ๊ณผ๋ฌผ์ ํ์ค ํ๊ฐ์์ ํ์์ ๊ฒฐ๊ณผ๋ฌผ๊ณผ ๊ตฌ๋ณ์ด ๋ถ๊ฐ๋ฅํ๋ค | Kofinas et al.(2025): ์ฑ์ ์๋ค์ด AI ๊ฐ์
์ฌ๋ถ๋ฅผ ์ผ๋ฐ์ ์ผ๋ก ๊ตฌ๋ณํ์ง ๋ชปํจ | โ
์ง์ง๋จ |
| AI ํผ๋๋ฐฑ์ ์ ํ๋ ๋ฉด์์ ๊ต์์ ํผ๋๋ฐฑ๊ณผ ๋น๊ต ๊ฐ๋ฅํ๋ค | Usher(2025): AI ์ฑ๋ด์ด ์ธ๊ฐ ํ๊ฐ์๋ณด๋ค ๋ ๋์ ์ ์๋ฅผ ๋ถ์ฌํ๋ฉฐ, ์ผ๊ด์ฑ์ ์ ํ์ฑ๊ณผ ๋ค๋ฆ | โ ๏ธ ๋ถํ์ค |
| AI-๋ฐฉ์ง ํ๊ฐ๋ ํ์ฅ ๊ฐ๋ฅํ๋ค | Ilieva et al.(2025): ์คํ ๊ฐ๋ฅํ๋ ์ง๋จ ๊ท๋ชจ์ ๋น๋กํ๋ ์ํ๊ด ์๊ฐ์ด ํ์ํจ | โ ๋ฐ๋ฐ๋จ |
| AI-ํฌ๋ช
ํ๊ฐ๋ ๊ณ ์ฐจ์์ ๊ธฐ๋ฅ์ ๊ณ๋ฐํ๋ค | ํ์ผ๋ฟ ์ฐ๊ตฌ๋ก ๋ท๋ฐ์นจ๋ ์ด๋ก ์ ๋
ผ๊ฑฐ; ๋๊ท๋ชจ RCT ์์ | โ ๏ธ ๋ถํ์ค |
| SOUR ์ํ์ ํ๋ฌธ์ ๋ฌด๊ฒฐ์ฑ์ ์ ์งํ๋ค | Newton & Draper(2025): ๋์ ๋ถ์ ํ์ ์์ค, AI๋ก ์ธํด ํ์ง๊ฐ ์ ์ ๋ ์ด๋ ค์์ง | โ ๋ฐ๋ฐ๋จ |
๋ฏธํด๊ฒฐ ์์
AI ํ์ง๋ ๋ง๋ค๋ฅธ ๊ธธ์ธ๊ฐ? ํ์ฌ AI ํ์ง ๋๊ตฌ๋ ๋์ ์์์ฑ๋ฅ ์ ๋๋ฐํ์ฌ 70~85%์ ์ ํ๋๋ฅผ ๋ฌ์ฑํ๋ค. ์ธ์ด ๋ชจ๋ธ์ด ๋ฐ์ ํจ์ ๋ฐ๋ผ ์ ํ๋ ๊ฒฉ์ฐจ๋ ๋์ฑ ๋ฒ์ด์ง ๊ฒ์ด๋ค. ๋ํ์ ํ์ง๋ฅผ ์์ ํ ํฌ๊ธฐํ๊ณ ํ๊ฐ ์ฌ์ค๊ณ์๋ง ์ง์คํด์ผ ํ๋๊ฐ?AI ์๋์ ์ด๋ค ํ๊ฐ ์ญ๋์ด ๋ ๊ฐ์น ์์ด์ง๋๊ฐ? AI๊ฐ ์ ๋ฅํ ์ด์์ ์์ฑํ ์ ์๋ค๋ฉด, ๊ฐ์น๋ ํ๊ฐ, ์ข
ํฉ, ํ๋จ๋ ฅ, ๊ทธ๋ฆฌ๊ณ ๋๋ฝ๋ ๊ฒ์ ์๋ณํ๋ ๋ฅ๋ ฅ์ผ๋ก ์ด๋ํ๋ค. ์ด๋ ์ ํํ Bloom์ ๋ถ๋ฅ ์ฒด๊ณ์์ ์ต์์์ ์์นํ๋ ๊ธฐ๋ฅ๋ค์ด๋ค.์ฐ์ถ๋ฌผ๋ง ๋ณด์ด๋ ์ํฉ์์ ๊ณผ์ ์ ์ด๋ป๊ฒ ํ๊ฐํ๋๊ฐ? AI-ํฌ๋ช
ํ๊ฐ๋ ํ์์ด AI์ ์ด๋ป๊ฒ ์ํธ์์ฉํ๋์ง์ ๋ํ ํต์ฐฐ์ ํ์๋ก ํ์ง๋ง, ํ์ฌ์ LLM์ ํ์ค ํ์์ผ๋ก ๊ฐ์ฌ ๊ฐ๋ฅํ ์ํธ์์ฉ ๊ธฐ๋ก์ ์ ๊ณตํ์ง ์๋๋ค. ํ๊ฐ ํ๋ซํผ์ ์ํธ์์ฉ ๊ธฐ๋ก์ ์๋ฌดํํด์ผ ํ๋๊ฐ?AI ์ ๊ทผ์ฑ์ด ์๋ ํ์๋ค์ ์ด๋ป๊ฒ ๋๋๊ฐ? ํ๊ฐ๊ฐ AI ํ์ฉ์ ์ ์ ๋ก ํ๋ค๋ฉด, ์์ ์ ์ธ ์ธํฐ๋ท, ์ต์ ๊ธฐ๊ธฐ, ๋๋ ์ ๋ฃ API ์ ๊ทผ์ด ์๋ ํ์๋ค์ ๋ถ์ด์ต์ ๋ฐ๋๋ค. AI ํตํฉ ํ๊ฐ๋ ์๋ก์ด ๋์งํธ ๊ฒฉ์ฐจ๋ฅผ ์ด๋ํ ์ ์๋ค.AI๊ฐ ๋ณต์ ํ ์ ์๋ ํ์ต์ ์ธก์ ํ ์ ์๋๊ฐ? ์ฒดํ๋ ์ง์, ์ค๋ฆฌ์ ํ๋จ, ๊ด๊ณ์ ์ญ๋, ์ฐฝ์์ ๋น์ โ์ด๊ฒ๋ค์ด ํ๊ฐ์ ์๋ก์ด ์งํ์ด ๋ ์ ์๋ค. ๊ทธ๋ฌ๋ ์ด๋ ๋ํ ์ ๋ขฐํ ์ ์๋ ๋ฐฉ์์ผ๋ก ํ๊ฐํ๊ธฐ ๊ฐ์ฅ ์ด๋ ค์ด ๊ฒ๋ค์ด๊ธฐ๋ ํ๋ค.์์ฌ์
์ฆ๊ฑฐ๋ ํผํ ์ ์๋ ๊ฒฐ๋ก ์ ๊ฐ๋ฆฌํจ๋ค: ์์ธ์ด, ์ํ, ๋ณด๊ณ ์๋ก ์ด๋ฃจ์ด์ง ๊ณ ๋ฑ๊ต์ก์ ์ ํต์ ์ธ ํ๊ฐ ๋๊ตฌ ์ฒด๊ณ๋ AI๊ฐ ์ด ๋ชจ๋ ๊ฒ์ ์ถฉ๋ถํ ์์ค์ ๊ฒฐ๊ณผ๋ฌผ์ ์์ฑํ ์ ์๋ ์ธ๊ณ์์ ๋ ์ด์ ๊ทธ ๋ชฉ์ ์ ๋ถํฉํ์ง ์๋๋ค. ์ด๊ฒ์ ์ผ์์ ์ธ ํผ๋์ด ์๋๋ผ ํ๊ฐ์ ์ธ์๋ก ์ ํ ๋์์ ์ผ์ด๋๋ ์๊ตฌ์ ์ธ ๋ณํ์ด๋ค.
๋ฒ์ํ ๊ธฐ๊ด๋ค์ ์ด ์๊ฐ์ ์ํ์ด ์๋๋ผ ํ๊ฐ๋ฅผ ์ฌ์ค๊ณํ ์ด๋์ฅ์ผ๋ก ๋ฐ์๋ค์ด๋ ๊ณณ๋ค์ด๋ค. ์ฆ, AI๊ฐ ํ ์ ์๋ ์ธ๊ฐ ๊ณ ์ ์ ๋ฅ๋ ฅ, ๋ค์ ๋งํด ์ง์ ํ ๋ถํ์ค์ฑ ์์์ ํ๋จ์ ๋ด๋ฆฌ๊ณ , ํ๋ฌธ ๋ถ์ผ์ ๊ฒฝ๊ณ๋ฅผ ๋์ด ์ง์์ ํตํฉํ๋ฉฐ, ์ฌํ์ด ์๋ ์ฐฝ์กฐ๋ฅผ ํ๊ณ , ์์ ์ ๊ฒฐ์ ์ด ๊ฐ์ ธ์ค๋ ๊ฒฐ๊ณผ์ ๋ํด ์ค๋ฆฌ์ ์ฑ
์์ ์ง๋ ๋ฅ๋ ฅ์ ์ค์ฌ์ผ๋ก ํ๊ฐ๋ฅผ ์ฌ์ค๊ณํ๋ ๊ธฐ๊ด๋ค์ด๋ค.
AI ์๋์ ํนํ ์ ๋งํ ํ๊ฐ ๋ฐฉ์์ ๊ฐ์ฅ ๋จ์ํ ๊ฒ ์ค ํ๋์ผ ์ ์๋ค: ํ์๊ณผ ๋ง์ฃผ ์์ ์๋ก์ด ๋ฌธ์ ๋ฅผ ์ ์ํ๊ณ ํฐ ์๋ฆฌ๋ก ์๊ฐํ๋๋ก ์์ฒญํ๋ ๊ฒ์ด๋ค. ์ด๋ค AI๋ ์ค์๊ฐ ์ง์ ๋ํ์์ ์ผ์ด๋๋, ์ด์์๊ณ ์ฒดํ๋ ์ดํด์ ์์ฐ์ ํ๋ด ๋ผ ์ ์๋ค. ์์ด๋ฌ๋ํ๊ฒ๋, ๊ฐ์ฅ ์ค๋๋ ํ๊ฐ ํ์์ธ ์ํฌ๋ผํ
์ค์ ๊ตฌ์ ์ํ์ด AI์ ๊ฐ์ฅ ๊ฐํ ์ ํญ๋ ฅ์ ์ง๋ ๊ฒ์ผ๋ก ํ๋ช
๋ ์ ์๋ค.
References (6)
[1] Kofinas, A.K., Tsay, C., & Pike, D. (2025). The Impact of Generative AI on Academic Integrity of Authentic Assessments Within a Higher Education Context. British Journal of Educational Technology, 56(5).
[2] Usher, M. (2025). Generative AI vs. Instructor vs. Peer Assessments: A Comparison of Grading and Feedback in Higher Education. Assessment & Evaluation in Higher Education, 50(4).
[3] Ilieva, G., Yankova, T., Ruseva, M., & Kabaivanov, S. (2025). A Framework for Generative AI-Driven Assessment in Higher Education. Information, 16(6), 472.
[4] Newton, P. & Draper, M. (2025). Widespread Use of Summative Online Unsupervised Remote Examinations in UK Higher Education: Ethical and Quality Assurance Implications. Quality in Higher Education, 31(1).
[5] Li, Y. & Xie, M. (2025). Navigating International Challenges of Quality Assurance in Higher Education: A Synergy of Gen-AI and Human-Made Solutions. Chinese Frontiers of Social Psychology and Sociology.
Usher, M. (2025). Generative AI vs. instructor vs. peer assessments: a comparison of grading and feedback in higher education. Assessment & Evaluation in Higher Education, 50(6), 912-927.