Trend AnalysisEducation

AI Tutors in Engineering Education: Domain Expertise vs. General Intelligence

Engineering education demands precision that general-purpose LLMs cannot reliably deliver. A wave of domain-specific AI tutors—from geotechnical engineering to biomechanics—reveals both the promise and the peril of teaching students disciplines where wrong answers can collapse bridges.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

In most academic disciplines, a tutor who is occasionally wrong is merely unhelpful. In engineering, a tutor who is occasionally wrong is potentially dangerous. When a first-year student learning structural analysis receives confidently incorrect guidance on load-bearing calculations, the pedagogical harm extends beyond a failed exam—it plants misconceptions that, uncorrected, could eventually manifest in professional practice where steel beams, patient prosthetics, and electrical systems leave no room for hallucination.

This fundamental constraint—that engineering education operates in domains where precision is not a desirable feature but an existential requirement—shapes the entire landscape of AI tutoring in engineering. And it explains why notable advances in this space have come not from deploying general-purpose LLMs like ChatGPT but from building domain-specific AI tutoring systems that trade breadth for reliability.

The Evidence Base: What Works and What Doesn't

Frankford, Sauerwein, and Bassner (2024) provide a rigorous empirical evaluation. Their exploratory case study embedded LLM-based tutoring into a software engineering course, using surveys and assessments to evaluate the integration. The study identified both advantages and challenges of LLM-based tutoring in programming education.

The findings reveal a mixed picture. On the positive side, the AI tutor provided timely feedback and scalable support. But the study also identified significant challenges:

Conceptual transfer limitations: Students could reproduce solutions the AI had demonstrated but showed limited transfer of underlying reasoning to novel problems.
Generic response quality: AI-generated feedback often lacked the specificity that domain-expert instructors provide, particularly for advanced design decisions.
Dependency concerns: Students reported concerns about potential learning progress inhibition—a worry that reliance on AI feedback might reduce independent problem-solving capacity.

This pattern—strong procedural gains, weak conceptual transfer, and emerging dependency—recurs across virtually every domain-specific deployment.

Domain-Specific Architectures: The EngiBot Approach

Rodrigues, Pinto, and Gonçalves (2025) present EngiBot, a purpose-built AI tutoring system for engineering education that addresses the general-purpose LLM's limitations through two core subsystems:

Document Processing Pipeline: Rather than relying on open web retrieval, EngiBot extracts structured knowledge from technical PDF materials such as lecture notes and problem sets, constructing a course-specific knowledge base enriched with metadata and semantic structure. This domain-constrained approach aims to ensure precise and relevant response generation.

Natural Language Understanding Module: A hybrid approach combining rule-based intent classification and Large Language Models interprets student queries, enabling context-aware responses tailored to engineering domains.

Quantitative evaluation demonstrates effective performance in both knowledge extraction and question-answer retrieval, confirming the system's potential as a support tool in engineering education. The domain-constrained architecture represents a deliberate trade-off: reduced breadth for increased reliability in safety-critical domains.

The Geotechnical Challenge: When Soil Is Not a Textbook Problem

Tophel, Chen, and Hettiyadura (2025) take the domain-specificity argument further by testing LLM tutors in geotechnical engineering—a discipline where correct answers depend on site-specific conditions (soil type, water table, seismic zone) that no training corpus fully captures. Their comparative study evaluates multiple LLM APIs on undergraduate geotechnical problems and reveals a consistent hierarchy:

Factual recall (definitions, classification systems): LLMs performed reasonably well, suggesting that declarative knowledge is well-represented in training corpora.
Calculation-based problems (bearing capacity, settlement analysis): General-purpose LLMs showed notably lower accuracy, while fine-tuned domain models performed better—though still imperfectly.
Design judgment (choosing foundation type given ambiguous site data): All models performed poorly, with general-purpose LLMs often generating responses that were internally consistent but based on incorrect assumptions about soil behavior.

The implication is worth noting: the tasks that matter most in engineering education—exercising judgment under uncertainty—are the tasks where AI tutors remain least reliable.

JULIUS: Teaching Resilience, Not Just Answers

Martinez, Chong, and Maya (2025) introduce an alternative philosophy with JULIUS, an AI tutor designed not to make students better programmers but to make them more resilient programmers. JULIUS operates on the premise that the primary failure mode of engineering students is not lack of knowledge but lack of emotional regulation when confronting difficult problems.

When a student expresses frustration ("I've been stuck on this for hours"), JULIUS does not immediately provide a hint. Instead, it engages in metacognitive coaching: "What specifically is confusing? Can you identify where your understanding breaks down?" Only after the student articulates their confusion does JULIUS offer targeted assistance.

The design is grounded in connectivism and Challenge-Based Learning (CBL), promoting autonomy, reducing anxiety, and supporting real-world problem-solving through structured group dynamics. By withholding immediate solutions, JULIUS preserves the student's sense of autonomy; by providing targeted (not complete) assistance, it builds competence; by engaging in empathetic dialogue, it supports emotional resilience.

Results from a mixed-methods study in a Fundamentals of Programming course show that JULIUS users demonstrated significant improvements in conceptual understanding, logical reasoning, motivation, and collaboration based on pre- and post-test comparisons, with qualitative analysis confirming enhanced emotional well-being and metacognition.

Claims and Evidence

Claim	Evidence	Verdict
AI tutoring improves engineering students' procedural skills	Frankford et al. (2024): timely feedback and scalability advantages observed	✅ Supported
AI tutoring improves engineering students' conceptual understanding	Frankford et al. (2024): generic responses and transfer limitations noted	⚠️ Uncertain
Domain-specific AI tutors outperform general-purpose LLMs	Tophel et al. (2025): fine-tuned models outperform general-purpose LLMs on domain calculations	✅ Supported
Domain-constrained knowledge bases improve tutoring reliability	EngiBot (Rodrigues et al., 2025): effective performance in knowledge extraction and QA retrieval demonstrated	⚠️ Uncertain (single system, no comparative study)
AI tutors can teach engineering judgment	All studies: poor performance on design judgment tasks requiring contextual reasoning	❌ Refuted

Open Questions

Should AI tutors in engineering be regulated like engineering software? If students use AI-generated solutions in professional practice, does the AI tutor bear a form of professional liability? Current legal frameworks have no answer.

Can we teach judgment through counterfactual simulation? Rather than answering "What foundation should I use?", could an AI tutor simulate the consequences of each choice—"If you choose a shallow foundation on this soil, here is what happens to settlement over 50 years"?

How do we measure the hidden curriculum? Engineering education transmits not just technical knowledge but professional identity, ethical reasoning, and risk awareness. Can AI tutors contribute to these outcomes, or do they inherently reduce engineering education to a technical skill?

What is the right level of domain specificity? EngiBot's curated knowledge base trades generality for reliability. At what point does domain constraining become domain limiting—preventing students from making the cross-disciplinary connections that drive engineering innovation?

Implications

The message from this literature is clear but uncomfortable: AI tutors in engineering education work best for the tasks that matter least (procedural skill) and work worst for the tasks that matter most (design judgment). This does not mean they are useless—procedural fluency is a genuine bottleneck in engineering education, and relieving it frees instructor time for the judgment-intensive work that AI cannot yet support.

But it does mean that the narrative of AI tutoring as a replacement for human engineering instructors is premature by at least a decade. The most promising path is complementary: AI handles drill, practice, and procedural scaffolding while human instructors focus on design thinking, ethical reasoning, and the cultivation of professional judgment that no training corpus can encode.

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 저작물에서 인용하기 전에 특정 연구 결과, 통계 및 주장은 원본 논문을 통해 검증해야 한다.

공학 교육의 AI 튜터: 도메인 전문성 대 범용 지능

대부분의 학문 분야에서 가끔 틀리는 튜터는 단지 도움이 되지 않을 뿐이다. 그러나 공학에서 가끔 틀리는 튜터는 잠재적으로 위험하다. 구조 해석을 배우는 1학년 학생이 하중 계산에 대해 자신감 있게 잘못된 지도를 받는다면, 그 교육적 피해는 시험 불합격을 넘어선다—이는 교정되지 않을 경우 결국 강철 빔, 환자용 보조기구, 전기 시스템이 환각(hallucination)을 허용하지 않는 실무 현장에서 발현될 수 있는 오개념을 심어준다.

공학 교육이 정밀성이 바람직한 특성이 아닌 존재론적 요건인 도메인에서 작동한다는 이 근본적인 제약은 공학 분야 AI 튜터링의 전체 지형을 형성한다. 그리고 이것이 이 분야의 주목할 만한 발전이 ChatGPT와 같은 범용 LLM을 배포하는 것이 아니라, 범용성을 신뢰성과 교환하는 도메인 특화 AI 튜터링 시스템을 구축하는 방식으로부터 나온 이유를 설명한다.

증거 기반: 효과 있는 것과 없는 것

Frankford, Sauerwein, Bassner(2024)는 엄밀한 실증적 평가를 제공한다. 이들의 탐색적 사례 연구는 LLM 기반 튜터링을 소프트웨어 공학 강좌에 내재화하고, 설문 조사와 평가를 통해 그 통합 효과를 평가하였다. 이 연구는 프로그래밍 교육에서 LLM 기반 튜터링의 장점과 과제를 모두 규명하였다.

연구 결과는 복합적인 양상을 보여준다. 긍정적 측면으로는 AI 튜터가 시의적절한 피드백과 확장 가능한 지원을 제공하였다. 그러나 연구는 상당한 과제 또한 규명하였다.

개념적 전이의 한계: 학생들은 AI가 시연한 풀이를 재현할 수 있었지만, 기저의 추론을 새로운 문제에 전이하는 능력은 제한적이었다.
일반적 응답 품질: AI가 생성한 피드백은 도메인 전문가 교수자가 제공하는 구체성, 특히 고급 설계 결정에 관한 구체성이 부족한 경우가 많았다.
의존성 우려: 학생들은 잠재적인 학습 진전 억제에 대한 우려를 보고하였다—AI 피드백에 대한 의존이 독립적 문제 해결 능력을 저하시킬 수 있다는 우려였다.

강한 절차적 성과, 약한 개념적 전이, 그리고 의존성의 부상이라는 이 패턴은 거의 모든 도메인 특화 배포 사례에서 반복적으로 나타난다.

도메인 특화 아키텍처: EngiBot 접근법

Rodrigues, Pinto, Gonçalves(2025)는 두 가지 핵심 서브시스템을 통해 범용 LLM의 한계를 해결하는 공학 교육용 특화 AI 튜터링 시스템인 EngiBot을 제시한다.

문서 처리 파이프라인: 개방형 웹 검색에 의존하는 대신, EngiBot은 강의 노트 및 문제집과 같은 기술적 PDF 자료에서 구조화된 지식을 추출하여 메타데이터와 의미론적 구조로 풍부화된 강좌별 지식 베이스를 구축한다. 이 도메인 제한적 접근법은 정확하고 관련성 높은 응답 생성을 보장하는 것을 목표로 한다.

자연어 이해 모듈: 규칙 기반 의도 분류와 대형 언어 모델(Large Language Models)을 결합한 하이브리드 접근법이 학생의 질의를 해석하여 공학 도메인에 맞춤화된 맥락 인식 응답을 가능하게 한다.

정량적 평가는 지식 추출과 질의응답 검색 모두에서 효과적인 성능을 보여주며, 공학 교육의 지원 도구로서 이 시스템의 잠재력을 확인한다. 도메인 제한적 아키텍처는 의도적인 절충을 대표한다: 안전 임계적(safety-critical) 도메인에서 신뢰성 향상을 위해 범용성을 축소하는 것이다.

지반 공학적 도전: 토양이 교재 문제가 아닐 때

Tophel, Chen, and Hettiyadura(2025)는 지반공학—올바른 답이 어떤 훈련 말뭉치도 완전히 포착할 수 없는 현장별 조건(토양 유형, 지하수위, 지진대)에 의존하는 분야—에서 LLM 튜터를 검증함으로써 도메인 특수성 논거를 한층 더 발전시킨다. 이들의 비교 연구는 학부 지반공학 문제에 대해 복수의 LLM API를 평가하며 일관된 위계를 드러낸다:

사실적 회상(정의, 분류 체계): LLM은 비교적 양호한 성능을 보였으며, 이는 선언적 지식이 훈련 말뭉치에 잘 표현되어 있음을 시사한다.
계산 기반 문제(지지력, 침하 해석): 범용 LLM은 현저히 낮은 정확도를 보인 반면, 미세 조정된 도메인 모델은 더 나은 성능을 보였다—비록 여전히 불완전하지만.
설계 판단(모호한 현장 데이터가 주어졌을 때 기초 유형 선택): 모든 모델이 저조한 성능을 보였으며, 범용 LLM은 내부적으로는 일관되지만 토양 거동에 대한 잘못된 가정에 기반한 응답을 생성하는 경우가 많았다.

그 함의는 주목할 만하다: 공학 교육에서 가장 중요한 과제—불확실성 하에서 판단을 행사하는 것—가 바로 AI 튜터가 여전히 가장 신뢰하기 어려운 과제이다.

JULIUS: 단순한 답이 아닌 회복탄력성 교육

Martinez, Chong, and Maya(2025)는 JULIUS를 통해 대안적 철학을 제시한다. JULIUS는 학생을 더 뛰어난 프로그래머로 만드는 것이 아니라 더 회복탄력적인 프로그래머로 만들도록 설계된 AI 튜터이다. JULIUS는 공학 학생들의 주된 실패 양식이 지식의 부족이 아니라 어려운 문제에 직면했을 때의 감정 조절 능력 부족이라는 전제 하에 작동한다.

학생이 좌절감을 표현할 때("몇 시간째 막혀 있어요"), JULIUS는 즉시 힌트를 제공하지 않는다. 대신 메타인지 코칭에 참여한다: "구체적으로 무엇이 혼란스러운가요? 이해가 무너지는 지점을 파악할 수 있나요?" 학생이 자신의 혼란을 명료하게 표현한 후에야 비로소 JULIUS는 표적화된 도움을 제공한다.

이 설계는 연결주의와 도전 기반 학습(CBL)에 근거하며, 구조화된 집단 역동을 통해 자율성을 촉진하고, 불안을 줄이며, 실세계 문제 해결을 지원한다. 즉각적인 해결책을 보류함으로써 JULIUS는 학생의 자율감을 보존하고, 완전하지 않은 표적화된 도움을 제공함으로써 역량을 구축하며, 공감적 대화에 참여함으로써 감정적 회복탄력성을 지원한다.

프로그래밍 기초 과목에서 수행된 혼합 연구 방법 연구의 결과에 따르면, JULIUS 사용자는 사전·사후 검사 비교를 통해 개념적 이해, 논리적 추론, 동기, 협업에서 유의미한 향상을 보였으며, 질적 분석은 향상된 정서적 안녕감과 메타인지를 확인해 주었다.

주장과 근거

주장	근거	판정
AI 튜터링이 공학 학생의 절차적 기술을 향상시킨다	Frankford et al.(2024): 시의적절한 피드백과 확장성 이점 관찰됨	✅ 지지됨
AI 튜터링이 공학 학생의 개념적 이해를 향상시킨다	Frankford et al.(2024): 일반적 응답 및 전이 한계 지적됨	⚠️ 불확실
도메인 특화 AI 튜터가 범용 LLM을 능가한다	Tophel et al.(2025): 미세 조정 모델이 도메인 계산에서 범용 LLM을 능가함	✅ 지지됨
도메인 제한 지식 베이스가 튜터링 신뢰성을 향상시킨다	EngiBot(Rodrigues et al., 2025): 지식 추출 및 QA 검색에서 효과적인 성능 입증됨	⚠️ 불확실(단일 시스템, 비교 연구 없음)
AI 튜터가 공학적 판단을 가르칠 수 있다	모든 연구: 맥락적 추론을 요구하는 설계 판단 과제에서 저조한 성능	❌ 반박됨

미해결 질문

공학 분야의 AI 튜터는 공학 소프트웨어처럼 규제되어야 하는가? 학생이 전문적 실무에서 AI가 생성한 해결책을 사용한다면, AI 튜터는 일종의 전문적 책임을 지는가? 현행 법적 프레임워크는 이에 대한 답을 제시하지 못하고 있다.

반사실적 시뮬레이션을 통해 판단력을 가르칠 수 있는가? "어떤 기초를 사용해야 하는가?"라는 질문에 직접 답하는 대신, AI 튜터가 각 선택의 결과를 시뮬레이션할 수 있을까—"이 지반에서 얕은 기초를 선택한다면, 50년에 걸친 침하가 어떻게 진행되는지 보여주겠다"는 식으로?

숨겨진 교육과정을 어떻게 측정할 것인가? 공학 교육은 기술적 지식뿐만 아니라 직업적 정체성, 윤리적 추론, 위험 인식도 전달한다. AI 튜터가 이러한 학습 성과에 기여할 수 있는가, 아니면 AI 튜터는 본질적으로 공학 교육을 기술적 기능으로 축소시키는가?

적절한 도메인 특수성의 수준은 어느 정도인가? EngiBot의 큐레이션된 지식 베이스는 일반성을 포기하는 대신 신뢰성을 확보한다. 도메인 제약이 도메인 한계로 전환되는 시점—즉, 학생들이 공학 혁신을 이끄는 학제 간 연결을 이루지 못하게 되는 시점—은 언제인가?

시사점

이 문헌들이 전하는 메시지는 명확하지만 불편하다: 공학 교육에서 AI 튜터는 가장 덜 중요한 과제(절차적 기능)에서 가장 잘 작동하고, 가장 중요한 과제(설계 판단)에서 가장 잘 작동하지 않는다. 이것이 AI 튜터가 무용하다는 의미는 아니다—절차적 유창성은 공학 교육에서 실질적인 병목 지점이며, 이를 해소함으로써 교수자의 시간이 확보되어 AI가 아직 지원하지 못하는 판단 집약적 작업에 집중할 수 있다.

그러나 이는 AI 튜터링이 인간 공학 교수자를 대체할 수 있다는 서사가 적어도 10년은 시기상조임을 의미한다. 가장 유망한 방향은 상호보완적 접근이다: AI가 반복 훈련, 연습, 절차적 스캐폴딩을 담당하고, 인간 교수자는 어떠한 훈련 코퍼스로도 인코딩할 수 없는 설계적 사고, 윤리적 추론, 직업적 판단력 함양에 집중하는 것이다.

References (5)

[1] Frankford, E., Sauerwein, C., Bassner, P., Krusche, S., & Breu, R. (2024). AI-Tutoring in Software Engineering Education: Experiences with Large Language Models in Programming Assessments. Proc. IEEE/ACM ICSE-SEET 2024.

DOI Scholar

[2] Rodrigues, B., Pinto, R., & Gonçalves, G. (2025). EngiBot: An AI-Based Tutoring System for Personalized Learning in Engineering Education. Proc. IEEE ICELIE 2025.

DOI Scholar

[3] Tophel, A., Chen, L., & Hettiyadura, U. (2025). Towards an AI Tutor for Undergraduate Geotechnical Engineering: A Comparative Study. Information Retrieval Journal.

DOI Scholar

[4] Martinez, J.R., Chong, M., & Maya, S. (2025). Enhancing Algorithmic Thinking and Emotional Resilience in Programming Education Through AI Powered Virtual Tutoring. Proc. IEEE FIE 2025.

DOI Scholar

[5] Yan, H., Lu, Q., & Wang, X. (2025). Build AI Assistants Using Large Language Models and Agents to Enhance Biomechanics Education. arXiv:2511.15752.

DOI Scholar

AI Tutors in Engineering Education: Domain Expertise vs. General Intelligence

The Evidence Base: What Works and What Doesn't

Domain-Specific Architectures: The EngiBot Approach

The Geotechnical Challenge: When Soil Is Not a Textbook Problem

JULIUS: Teaching Resilience, Not Just Answers

Claims and Evidence

Open Questions

Implications

공학 교육의 AI 튜터: 도메인 전문성 대 범용 지능

증거 기반: 효과 있는 것과 없는 것

도메인 특화 아키텍처: EngiBot 접근법

지반 공학적 도전: 토양이 교재 문제가 아닐 때

JULIUS: 단순한 답이 아닌 회복탄력성 교육

주장과 근거

미해결 질문

시사점

References (5)

Explore this topic deeper