Trend AnalysisLinguistics & NLP

Pragmatics in Conversational AI: Can Chatbots Understand What We Really Mean?

Pragmatic competence, the ability to understand what speakers mean beyond what they literally say, remains one of the deepest challenges for conversational AI. Recent work evaluates chatbots against Gricean maxims and implicature theory.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

When a dinner guest says "It's getting late," they typically mean "I want to leave," not merely that the clock shows a late hour. This gap between what is said and what is meant, the domain of pragmatics, represents perhaps the most fundamental challenge for conversational AI systems. While large language models have achieved impressive performance on tasks requiring syntactic and semantic competence, pragmatic competence, understanding implicature, indirect speech acts, presupposition, and conversational context, remains a frontier where AI systems regularly fail in ways that range from awkward to harmful. Grice's Cooperative Principle and its maxims (Quantity, Quality, Relation, Manner), along with speech act theory, provide the theoretical framework for evaluating whether AI systems truly participate in conversation or merely simulate participation.

Why It Matters

Conversational AI systems are deployed in contexts where pragmatic failure has real consequences. A healthcare chatbot that takes "I'm fine" literally when a patient is being stoic could miss critical symptoms. A customer service bot that responds to "Can you transfer me to a human?" by answering "Yes, I can" without actually transferring violates the pragmatics of indirect requests. An emotional companion chatbot that fails to detect conversational escalation through increasingly distressed implicatures could exacerbate mental health crises. As conversational AI moves from information retrieval to genuine interaction, pragmatic competence becomes not optional but essential.

For linguistics, AI systems provide a unique test bed for pragmatic theory. If a system that processes only textual patterns can approximate pragmatic behavior, this constrains theories about what pragmatic competence requires. If it cannot, the specific failure modes reveal which aspects of pragmatic processing are irreducible to pattern matching and require genuine social cognition.

The Science

Evaluating Chatbots Against Speech Act Theory

Aziz (2025) provides a systematic evaluation of whether AI chatbots follow the principles of Speech Act Theory and Grice's Cooperative Aziz (2025). The study analyzes AI-generated conversations for compliance with each Gricean maxim and for appropriate performance of illocutionary acts (asserting, requesting, promising, apologizing). The findings reveal a consistent pattern: chatbots generally respect the maxims of Quality (they avoid stating things they do not have evidence for) and Manner (they are reasonably clear), but frequently violate Quantity (providing too much or too little information) and Relation (including irrelevant elaborations). For speech acts, chatbots perform direct speech acts competently but struggle with indirect speech acts where the surface form diverges from the intended function, such as "Could you close the window?" functioning as a request rather than a question about ability.

Conversational Implicature in Human-AI Interaction

Salman and Matrood (2025) examine how conversational implicature, the meaning that is implied but not explicitly stated, functions in human-AI interactions. Their analysis reveals that AI systems face particular difficulty with three types of implicature: scalar implicature (where "some students passed" implies "not all students passed"), particularized conversational implicature (meaning derived from specific context), and ironic implicature (where the implied meaning is opposite to the literal meaning). The study identifies a fundamental asymmetry: human users naturally produce implicatures when talking to AI, expecting the same pragmatic processing they receive from human interlocutors, but AI systems process these utterances primarily at the literal level. This asymmetry is a major source of miscommunication in human-AI dialogue.

Computational Modeling of Scalar Implicature

Li et al. (2024) develop a formal computational model of scalar implicature using Bayesian methods, implementing a small dialogue system that can derive scalar implicatures from first principles. Their approach treats scalar implicature as a probabilistic inference problem: given that a speaker chose a weaker term (e.g., "some") when a stronger term was available (e.g., "all"), the listener infers that the stronger term does not apply. The Bayesian framework quantifies this inference by modeling the speaker's choice as a function of the state of the world and communicative goals. While the system operates in a constrained domain, it demonstrates that principled computational pragmatics is achievable and produces more accurate interpretations than purely literal processing.

Sentiment in Implicature Processing

Li and Xu (2025) connect pragmatics to sentiment analysis by developing a computational pragmatics approach to detecting sentiment in conversational implicatures. Their key insight is that the sentiment of an utterance often resides in its implicature rather than its literal content: "That's an interesting proposal" can be genuinely positive or devastatingly dismissive depending on conversational context. The study formalizes the relationship between response sentiment and implicature type, showing that sentiment classification accuracy improves significantly when pragmatic context is modeled explicitly rather than relying solely on lexical sentiment indicators. This work bridges two NLP subfields, sentiment analysis and computational pragmatics, that have developed largely independently.

Pragmatic Competence in Current AI Systems

Pragmatic Phenomenon	AI Capability	Failure Mode	Required Advance
Direct speech acts	Strong	Rare failures	Largely solved for common types
Indirect speech acts	Moderate	Literal interpretation of requests/questions	Context-dependent intent recognition
Scalar implicature	Low-moderate	Missing "some ≠ all" inferences	Formal pragmatic reasoning
Particularized implicature	Low	Context-blind processing	Rich situation modeling
Irony and sarcasm	Low	Literal interpretation	Stance and social context modeling
Presupposition	Moderate	Fails to accommodate or challenge	Common ground tracking
Politeness strategies	Moderate	Overly direct or formulaic	Cultural pragmatic competence

What To Watch

The most promising direction is the integration of pragmatic theory into LLM training and evaluation, rather than hoping that pragmatic competence emerges as a byproduct of scale. Benchmark suites that test specific pragmatic phenomena (the BIG-Bench pragmatics tasks, the Pragmatic Understanding benchmarks) are enabling systematic measurement of progress. The development of theory-of-mind capabilities in AI, enabling systems to model what their interlocutor knows, believes, and intends, is a prerequisite for genuine pragmatic competence, as implicature computation fundamentally requires reasoning about the speaker's mental state. Whether current transformer architectures can support this kind of reasoning, or whether new architectures are needed, remains one of AI's most important open questions.

Discover related work using ORAA ResearchBrain.

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 저작물에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장을 원문 논문과 대조하여 검증해야 한다.

대화형 AI의 화용론: 챗봇은 우리가 진정으로 의미하는 바를 이해할 수 있는가?

저녁 식사 손님이 "시간이 꽤 됐네요"라고 말할 때, 그것은 단순히 시계가 늦은 시각을 가리킨다는 뜻이 아니라 대개 "이제 자리를 떠나고 싶다"는 의미이다. 발화된 것과 의도된 것 사이의 이 간극, 즉 화용론(pragmatics)의 영역은 대화형 AI 시스템이 직면한 가장 근본적인 도전 과제일 것이다. 대형 언어 모델(large language model)이 통사적·의미적 역량을 요구하는 과제에서 인상적인 성능을 달성했지만, 화용적 역량, 즉 함의(implicature), 간접 화행(indirect speech act), 전제(presupposition), 대화 맥락을 이해하는 능력은 AI 시스템이 어색하거나 때로는 유해한 방식으로 반복해서 실패하는 미개척 영역으로 남아 있다. Grice의 협력 원리(Cooperative Principle)와 그 격률들(양의 격률, 질의 격률, 관련성의 격률, 태도의 격률), 그리고 화행 이론(speech act theory)은 AI 시스템이 진정으로 대화에 참여하는지 아니면 단순히 참여를 모방하는지를 평가하는 이론적 틀을 제공한다.

왜 중요한가

대화형 AI 시스템은 화용적 실패가 실질적인 결과를 초래하는 맥락에서 사용되고 있다. 환자가 담담하게 "괜찮아요"라고 말할 때 그것을 문자 그대로 받아들이는 의료 챗봇은 중요한 증상을 놓칠 수 있다. "담당자를 연결해 줄 수 있나요?"라는 질문에 실제로 연결은 하지 않고 "네, 가능합니다"라고만 답하는 고객 서비스 봇은 간접 요청의 화용론을 위반한다. 점점 더 절박해지는 함의를 통한 대화의 고조를 감지하지 못하는 감정 지원 챗봇은 정신건강 위기를 악화시킬 수 있다. 대화형 AI가 정보 검색에서 진정한 상호작용으로 나아가면서, 화용적 역량은 선택 사항이 아니라 필수 요건이 되었다.

언어학의 관점에서 볼 때, AI 시스템은 화용 이론을 검증하는 독보적인 실험 환경을 제공한다. 텍스트 패턴만을 처리하는 시스템이 화용적 행동을 근사할 수 있다면, 이는 화용적 역량이 무엇을 필요로 하는지에 관한 이론을 제약한다. 그렇지 못하다면, 구체적인 실패 양상은 화용적 처리의 어떤 측면이 패턴 매칭으로는 환원될 수 없으며 진정한 사회적 인지를 필요로 하는지를 드러낸다.

연구 내용

화행 이론에 근거한 챗봇 평가

Aziz(2025)는 AI 챗봇이 화행 이론과 Grice의 협력 원리를 따르는지를 체계적으로 평가한다. 이 연구는 AI가 생성한 대화를 대상으로 각 Grice 격률의 준수 여부와 발화수반행위(illocutionary act)(단언, 요청, 약속, 사과)의 적절한 수행 여부를 분석한다. 연구 결과는 일관된 패턴을 보여준다. 챗봇은 대체로 질의 격률(근거 없는 내용을 진술하지 않음)과 태도의 격률(합리적으로 명료함)은 준수하지만, 양의 격률(너무 많거나 너무 적은 정보 제공)과 관련성의 격률(관련 없는 부연 설명 포함)은 빈번히 위반한다. 화행의 측면에서 챗봇은 직접 화행은 유능하게 수행하지만, "창문 좀 닫아 줄 수 있나요?"가 능력에 대한 질문이 아니라 요청으로 기능하는 것처럼 표면 형식이 의도된 기능과 다른 간접 화행에서는 어려움을 겪는다.

인간-AI 상호작용에서의 대화적 함의

Salman과 Matrood(2025)는 대화 함축(conversational implicature), 즉 명시적으로 언급되지 않고 암시된 의미가 인간-AI 상호작용에서 어떻게 기능하는지를 검토한다. 그들의 분석은 AI 시스템이 세 가지 유형의 함축을 처리하는 데 특히 어려움을 겪는다는 점을 밝힌다: 척도 함축(scalar implicature)(예: "일부 학생들이 통과했다"가 "모든 학생들이 통과한 것은 아니다"를 함축하는 경우), 특수화된 대화 함축(particularized conversational implicature)(특정 맥락에서 도출된 의미), 그리고 반어적 함축(ironic implicature)(함축된 의미가 문자적 의미와 반대인 경우)이 그것이다. 이 연구는 근본적인 비대칭성을 확인한다: 인간 사용자는 AI와 대화할 때 자연스럽게 함축을 생성하며 인간 대화 상대에게 받는 것과 동일한 화용론적 처리를 기대하지만, AI 시스템은 이러한 발화를 주로 문자적 수준에서 처리한다. 이 비대칭성은 인간-AI 대화에서 의사소통 오류의 주요 원인이다.

척도 함축의 계산 모델링

Li et al.(2024)은 베이지안(Bayesian) 방법을 사용하여 척도 함축의 형식적 계산 모델을 개발하고, 제1원리로부터 척도 함축을 도출할 수 있는 소규모 대화 시스템을 구현한다. 이들의 접근 방식은 척도 함축을 확률적 추론 문제로 취급한다: 화자가 더 강한 표현(예: "all")이 가능한 상황에서 더 약한 표현(예: "some")을 선택했다면, 청자는 더 강한 표현이 적용되지 않는다고 추론한다. 베이지안 프레임워크는 화자의 선택을 세계의 상태와 의사소통 목표의 함수로 모델링함으로써 이 추론을 정량화한다. 이 시스템은 제한된 영역에서 작동하지만, 원칙적 계산 화용론(computational pragmatics)이 실현 가능하며 순수하게 문자적인 처리보다 더 정확한 해석을 산출함을 보여준다.

함축 처리에서의 감성

Li와 Xu(2025)는 대화 함축에서 감성을 탐지하는 계산 화용론적 접근 방식을 개발함으로써 화용론과 감성 분석(sentiment analysis)을 연결한다. 이들의 핵심 통찰은 발화의 감성이 문자적 내용보다 함축 속에 담겨 있는 경우가 많다는 것이다: "그것은 흥미로운 제안이네요"는 대화 맥락에 따라 진정으로 긍정적이거나 극도로 무시하는 표현이 될 수 있다. 이 연구는 응답 감성과 함축 유형 간의 관계를 형식화하며, 어휘적 감성 지표에만 의존하는 대신 화용론적 맥락을 명시적으로 모델링할 때 감성 분류 정확도가 유의미하게 향상됨을 보여준다. 이 연구는 대체로 독립적으로 발전해 온 두 NLP 하위 분야인 감성 분석과 계산 화용론을 연결한다.

현재 AI 시스템의 화용 능력

화용 현상	AI 능력	실패 양상	필요한 발전
직접 화행(Direct speech acts)	강함	드문 실패	일반적 유형에 대해 대체로 해결됨
간접 화행(Indirect speech acts)	보통	요청/질문의 문자적 해석	맥락 의존적 의도 인식
척도 함축(Scalar implicature)	낮음-보통	"some ≠ all" 추론 누락	형식적 화용 추론
특수화된 함축(Particularized implicature)	낮음	맥락을 고려하지 않는 처리	풍부한 상황 모델링
반어 및 빈정댐(Irony and sarcasm)	낮음	문자적 해석	태도 및 사회적 맥락 모델링
전제(Presupposition)	보통	수용 또는 반박 실패	공통 기반(common ground) 추적
공손 전략(Politeness strategies)	보통	지나치게 직접적이거나 형식적	문화적 화용 능력

주목할 사항

가장 유망한 방향은 화용론적 능력이 규모의 부산물로 자연스럽게 나타나기를 기대하는 것이 아니라, 화용론적 이론을 LLM 훈련 및 평가에 통합하는 것이다. 특정 화용론적 현상을 검증하는 벤치마크 모음(BIG-Bench 화용론 과제, Pragmatic Understanding 벤치마크)은 진전 상황의 체계적인 측정을 가능하게 하고 있다. AI에서 마음 이론(theory-of-mind) 능력의 개발—즉, 시스템이 대화 상대방이 무엇을 알고, 믿고, 의도하는지를 모델링할 수 있게 하는 것—은 진정한 화용론적 능력의 전제 조건이다. 함축 계산은 근본적으로 화자의 정신 상태에 대한 추론을 요구하기 때문이다. 현재의 트랜스포머(transformer) 아키텍처가 이러한 종류의 추론을 지원할 수 있는지, 아니면 새로운 아키텍처가 필요한지는 AI 분야에서 가장 중요한 미해결 문제 중 하나로 남아 있다.

ORAA ResearchBrain을 사용하여 관련 연구를 탐색하라.

References (4)

[1] Aziz, A.A. (2025). AI and Pragmatics: Do Chatbots Follow Speech Acts & Maxims? Wasit J. for Humanities, 21(3).

DOI Scholar

[2] Salman, Y. & Matrood, D. (2025). Conversational Implicature in Human-AI Interactions. FGR, 1(3).

DOI Scholar

[3] Li, X. & Xu, K. (2025). Sentiment Analysis of Conversational Implicature: A Computational Pragmatics Approach. Applied Artificial Intelligence, 39.

DOI Scholar

[4] Li, X., Yin, X., & Xu, K. (2024). A Model of Conversational Scalar Implicature in Computational Pragmatics. Proc. PRML 2024, IEEE.

DOI Scholar

Pragmatics in Conversational AI: Can Chatbots Understand What We Really Mean?

Why It Matters

The Science

Evaluating Chatbots Against Speech Act Theory

Conversational Implicature in Human-AI Interaction

Computational Modeling of Scalar Implicature

Sentiment in Implicature Processing

Pragmatic Competence in Current AI Systems

What To Watch

대화형 AI의 화용론: 챗봇은 우리가 진정으로 의미하는 바를 이해할 수 있는가?

왜 중요한가

연구 내용

화행 이론에 근거한 챗봇 평가

인간-AI 상호작용에서의 대화적 함의

척도 함축의 계산 모델링

함축 처리에서의 감성

현재 AI 시스템의 화용 능력

주목할 사항

References (4)

Explore this topic deeper