Trend AnalysisLinguistics & NLP

Pragmatics in Conversational AI: Can Chatbots Understand What We Really Mean?

Pragmatic competence, the ability to understand what speakers mean beyond what they literally say, remains one of the deepest challenges for conversational AI. Recent work evaluates chatbots against Gricean maxims and implicature theory.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

When a dinner guest says "It's getting late," they typically mean "I want to leave," not merely that the clock shows a late hour. This gap between what is said and what is meant, the domain of pragmatics, represents perhaps the most fundamental challenge for conversational AI systems. While large language models have achieved impressive performance on tasks requiring syntactic and semantic competence, pragmatic competence, understanding implicature, indirect speech acts, presupposition, and conversational context, remains a frontier where AI systems regularly fail in ways that range from awkward to harmful. Grice's Cooperative Principle and its maxims (Quantity, Quality, Relation, Manner), along with speech act theory, provide the theoretical framework for evaluating whether AI systems truly participate in conversation or merely simulate participation.

Why It Matters

Conversational AI systems are deployed in contexts where pragmatic failure has real consequences. A healthcare chatbot that takes "I'm fine" literally when a patient is being stoic could miss critical symptoms. A customer service bot that responds to "Can you transfer me to a human?" by answering "Yes, I can" without actually transferring violates the pragmatics of indirect requests. An emotional companion chatbot that fails to detect conversational escalation through increasingly distressed implicatures could exacerbate mental health crises. As conversational AI moves from information retrieval to genuine interaction, pragmatic competence becomes not optional but essential.

For linguistics, AI systems provide a unique test bed for pragmatic theory. If a system that processes only textual patterns can approximate pragmatic behavior, this constrains theories about what pragmatic competence requires. If it cannot, the specific failure modes reveal which aspects of pragmatic processing are irreducible to pattern matching and require genuine social cognition.

The Science

Evaluating Chatbots Against Speech Act Theory

Aziz (2025) provides a systematic evaluation of whether AI chatbots follow the principles of Speech Act Theory and Grice's Cooperative Aziz (2025). The study analyzes AI-generated conversations for compliance with each Gricean maxim and for appropriate performance of illocutionary acts (asserting, requesting, promising, apologizing). The findings reveal a consistent pattern: chatbots generally respect the maxims of Quality (they avoid stating things they do not have evidence for) and Manner (they are reasonably clear), but frequently violate Quantity (providing too much or too little information) and Relation (including irrelevant elaborations). For speech acts, chatbots perform direct speech acts competently but struggle with indirect speech acts where the surface form diverges from the intended function, such as "Could you close the window?" functioning as a request rather than a question about ability.

Conversational Implicature in Human-AI Interaction

Salman and Matrood (2025) examine how conversational implicature, the meaning that is implied but not explicitly stated, functions in human-AI interactions. Their analysis reveals that AI systems face particular difficulty with three types of implicature: scalar implicature (where "some students passed" implies "not all students passed"), particularized conversational implicature (meaning derived from specific context), and ironic implicature (where the implied meaning is opposite to the literal meaning). The study identifies a fundamental asymmetry: human users naturally produce implicatures when talking to AI, expecting the same pragmatic processing they receive from human interlocutors, but AI systems process these utterances primarily at the literal level. This asymmetry is a major source of miscommunication in human-AI dialogue.

Computational Modeling of Scalar Implicature

Li et al. (2024) develop a formal computational model of scalar implicature using Bayesian methods, implementing a small dialogue system that can derive scalar implicatures from first principles. Their approach treats scalar implicature as a probabilistic inference problem: given that a speaker chose a weaker term (e.g., "some") when a stronger term was available (e.g., "all"), the listener infers that the stronger term does not apply. The Bayesian framework quantifies this inference by modeling the speaker's choice as a function of the state of the world and communicative goals. While the system operates in a constrained domain, it demonstrates that principled computational pragmatics is achievable and produces more accurate interpretations than purely literal processing.

Sentiment in Implicature Processing

Li and Xu (2025) connect pragmatics to sentiment analysis by developing a computational pragmatics approach to detecting sentiment in conversational implicatures. Their key insight is that the sentiment of an utterance often resides in its implicature rather than its literal content: "That's an interesting proposal" can be genuinely positive or devastatingly dismissive depending on conversational context. The study formalizes the relationship between response sentiment and implicature type, showing that sentiment classification accuracy improves significantly when pragmatic context is modeled explicitly rather than relying solely on lexical sentiment indicators. This work bridges two NLP subfields, sentiment analysis and computational pragmatics, that have developed largely independently.

Pragmatic Competence in Current AI Systems

<
Pragmatic PhenomenonAI CapabilityFailure ModeRequired Advance
Direct speech actsStrongRare failuresLargely solved for common types
Indirect speech actsModerateLiteral interpretation of requests/questionsContext-dependent intent recognition
Scalar implicatureLow-moderateMissing "some โ‰  all" inferencesFormal pragmatic reasoning
Particularized implicatureLowContext-blind processingRich situation modeling
Irony and sarcasmLowLiteral interpretationStance and social context modeling
PresuppositionModerateFails to accommodate or challengeCommon ground tracking
Politeness strategiesModerateOverly direct or formulaicCultural pragmatic competence

What To Watch

The most promising direction is the integration of pragmatic theory into LLM training and evaluation, rather than hoping that pragmatic competence emerges as a byproduct of scale. Benchmark suites that test specific pragmatic phenomena (the BIG-Bench pragmatics tasks, the Pragmatic Understanding benchmarks) are enabling systematic measurement of progress. The development of theory-of-mind capabilities in AI, enabling systems to model what their interlocutor knows, believes, and intends, is a prerequisite for genuine pragmatic competence, as implicature computation fundamentally requires reasoning about the speaker's mental state. Whether current transformer architectures can support this kind of reasoning, or whether new architectures are needed, remains one of AI's most important open questions.

Discover related work using ORAA ResearchBrain.

References (4)

[1] Aziz, A.A. (2025). AI and Pragmatics: Do Chatbots Follow Speech Acts & Maxims? Wasit J. for Humanities, 21(3).
[2] Salman, Y. & Matrood, D. (2025). Conversational Implicature in Human-AI Interactions. FGR, 1(3).
[3] Li, X. & Xu, K. (2025). Sentiment Analysis of Conversational Implicature: A Computational Pragmatics Approach. Applied Artificial Intelligence, 39.
[4] Li, X., Yin, X., & Xu, K. (2024). A Model of Conversational Scalar Implicature in Computational Pragmatics. Proc. PRML 2024, IEEE.

Explore this topic deeper

Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

Click to remove unwanted keywords

Search 8 keywords โ†’