Trend AnalysisLinguistics & NLP

Linguistic Bias in AI-Generated Text: How Language Models Encode and Amplify Stereotypes

AI language models don't just reflect existing biases in their training data; they can amplify and systematize them in ways that create new forms of linguistic discrimination across gender, race, and religion.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Language is never neutral. Every utterance carries traces of the social structures, power relations, and ideological commitments of its producers. When large language models are trained on billions of words of human-produced text, they inevitably absorb the biases embedded in that text. But the relationship between training data bias and model output bias is not simple reflection: language models can amplify biases that are subtle in the training data, create novel associations between social categories and attributes, and systematize biases that were inconsistent or contested in human discourse. Understanding how linguistic bias operates in AI-generated text is simultaneously a problem in computational linguistics, sociolinguistics, and ethics.

Why It Matters

AI-generated text is increasingly woven into the fabric of daily communication: email drafts, search results, educational content, news summaries, creative writing, and code documentation. When this text carries systematic biases, it does not merely reflect existing prejudice but creates a new channel for its propagation, one that operates at a scale and consistency no individual author could achieve. A language model that consistently generates male pronouns for doctors and female pronouns for nurses does not just mirror statistical patterns in training data; it produces a steady stream of reinforcing examples that shape user expectations and, over time, potentially shape social reality.

For linguistics, the bias problem reveals how deeply social meaning is embedded in language patterns. The distributional hypothesis that underlies word embeddings and language models, that a word's meaning is determined by its contexts of use, turns out to capture not only semantic meaning but social meaning: associations, stereotypes, and power relations that are encoded in the statistical patterns of who talks about whom, in what way, and in what contexts.

The Science

Gender and Racial Bias in Vision-Language Models

Fraser and Kiritchenko (2024) conduct the first systematic audit of gender and racial bias in large vision-language models (LVLMs), using a novel dataset of carefully constructed parallel images that differ only in the depicted person's perceived gender or race. This controlled methodology isolates bias: when the model generates different descriptions for images that differ only in the person's demographic characteristics, the difference is attributable to bias rather than confounds. The study identifies patterns including pervasive biases: models describe women with more appearance-related and emotional language, describe men with more action-related and professional language, and show racial biases in attribution of intelligence, threat, and socioeconomic status. The severity of bias varies across models, with some architectures showing less bias than others, suggesting that architectural and training choices matter.

Gender Bias in Text Generation

Soundararajan and Delany (2024) investigate gender bias specifically in LLM text generation, examining how models associate occupations, personality traits, and social roles with gender. Their methodology generates thousands of text completions for gender-neutral prompts and analyzes the statistical distribution of gendered language in the outputs. The results confirm that LLMs disproportionately associate certain occupations with specific genders (engineering with male, nursing with female), attribute different personality traits to male and female characters (agency vs. communion), and reproduce heteronormative assumptions in narrative generation. Notably, the biases are often more extreme in the model outputs than in the training data, a finding consistent with the amplification hypothesis: models learn to be more biased than their training data because they optimize for the most probable continuations, which tend to be the most stereotypical.

Religious Bias Across Modalities

Abrar et al. (2025) expand the bias analysis to religion, examining how both language models and text-to-image models represent different religious groups. Their systematic study reveals significant disparities in how religions are characterized in model outputs: some religions are consistently associated with violence and extremism while others are associated with peace and spirituality, reflecting and amplifying biases present in English-language media. The study contributes detection methods and debiasing strategies, including counterfactual data augmentation and constrained decoding. The multimodal dimension is important because text-to-image models can generate visual stereotypes (depicting members of certain religions in stereotypical clothing or settings) that reinforce textual biases.

Bias Amplification in Non-English Languages

Gupta et al. (2024) demonstrate that gender bias is even more pronounced when LLMs generate text in Hindi rather than English. Hindi's grammatical gender system, which assigns gender to nouns and requires gender agreement on verbs and adjectives, interacts with social biases to produce strongly gendered outputs. The study finds that LLMs default to masculine gender for professional occupations and feminine gender for domestic roles in Hindi text generation, and that this bias is more extreme than in equivalent English generation. The finding highlights a critical point: bias research conducted primarily on English cannot be assumed to generalize to other languages, particularly those with grammatical gender, honorific systems, or other morphosyntactic features that interact with social categories.

Bias Typology in AI-Generated Language

Bias Type	Manifestation in Language	Detection Method	Debiasing Approach
Gender-occupation	Male pronouns for STEM, female for care work	Pronoun distribution in occupational contexts	Counterfactual data augmentation
Racial attribution	Different trait language for different racial groups	Parallel prompt analysis	Representation balancing in training
Religious association	Violence/peace associations with specific religions	Sentiment analysis of religious content	Constrained decoding
Grammatical gender amplification	Exaggerated gender defaults in gendered languages	Cross-lingual comparison	Language-specific fine-tuning
Intersectional	Compounded bias for multiple minority identities	Intersectional prompt design	Multi-axis debiasing

What To Watch

The field is moving from bias detection to bias mitigation, but the solutions are far from settled. Debiasing approaches that work for one type of bias (gender) may not generalize to others (race, religion, disability), and approaches that reduce bias on benchmarks may not reduce bias in real-world deployment. Constitutional AI and RLHF (reinforcement learning from human feedback) offer frameworks for incorporating fairness constraints into training, but the definition of "fair" language is itself contested across cultures and philosophical traditions. The most important emerging direction may be participatory approaches that involve affected communities in defining, detecting, and mitigating the biases that matter most to them, rather than having bias definitions imposed by researchers and technologists who may not share the lived experience of the communities affected.

Discover related work using ORAA ResearchBrain.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문을 통해 반드시 검증해야 한다.

AI 생성 텍스트의 언어적 편향: 언어 모델이 고정관념을 인코딩하고 증폭시키는 방식

언어는 결코 중립적이지 않다. 모든 발화에는 그것을 생산한 주체의 사회적 구조, 권력 관계, 이념적 지향의 흔적이 담겨 있다. 대형 언어 모델이 수십억 개의 인간 생성 텍스트로 학습될 때, 그 텍스트에 내재된 편향을 불가피하게 흡수하게 된다. 그러나 학습 데이터의 편향과 모델 출력의 편향 사이의 관계는 단순한 반영이 아니다. 언어 모델은 학습 데이터에서 미묘하게 존재하는 편향을 증폭시키고, 사회적 범주와 속성 사이의 새로운 연관성을 만들어내며, 인간의 담론에서 불일치하거나 논쟁적이었던 편향을 체계화할 수 있다. AI 생성 텍스트에서 언어적 편향이 어떻게 작동하는지를 이해하는 것은 계산 언어학, 사회언어학, 그리고 윤리학이 동시에 관여하는 문제이다.

왜 중요한가

AI 생성 텍스트는 이메일 초안, 검색 결과, 교육 콘텐츠, 뉴스 요약, 창작 글쓰기, 코드 문서화 등 일상적 소통의 영역에 점점 더 깊이 침투하고 있다. 이 텍스트가 체계적인 편향을 내포할 때, 그것은 단순히 기존의 편견을 반영하는 것에 그치지 않고 그 전파를 위한 새로운 통로를 만들어낸다. 이 통로는 어떤 개인 저자도 달성할 수 없는 규모와 일관성으로 작동한다. 의사에게는 남성 대명사를, 간호사에게는 여성 대명사를 일관되게 생성하는 언어 모델은 학습 데이터의 통계적 패턴을 그저 반영하는 것이 아니라, 사용자의 기대를 형성하고 나아가 시간이 지남에 따라 사회적 현실 자체를 형성할 수 있는 강화 사례의 지속적인 흐름을 만들어낸다.

언어학의 관점에서, 편향 문제는 사회적 의미가 언어 패턴에 얼마나 깊이 내재되어 있는지를 드러낸다. 단어 임베딩과 언어 모델의 토대를 이루는 분포 가설, 즉 단어의 의미는 그것이 사용되는 맥락에 의해 결정된다는 가설은, 의미론적 의미뿐만 아니라 사회적 의미, 즉 누가 누구에 대해 어떤 방식으로 어떤 맥락에서 이야기하는지에 대한 통계적 패턴 속에 인코딩된 연관성, 고정관념, 권력 관계까지도 포착한다는 것이 밝혀졌다.

연구 내용

시각-언어 모델의 성별 및 인종 편향

Fraser와 Kiritchenko(2024)는 묘사된 인물의 인지된 성별 또는 인종에서만 차이가 나는 정교하게 구성된 병렬 이미지 데이터셋을 사용하여, 대형 시각-언어 모델(LVLM)의 성별 및 인종 편향에 대한 최초의 체계적 감사를 수행한다. 이 통제된 방법론은 편향을 분리해낸다. 즉, 인물의 인구통계학적 특성에서만 차이가 나는 이미지에 대해 모델이 서로 다른 설명을 생성할 때, 그 차이는 혼재 변수가 아닌 편향에 기인한 것으로 볼 수 있다. 이 연구는 만연한 편향 패턴을 규명한다. 모델은 여성을 외모 관련 언어와 감정적 언어로, 남성을 행동 관련 언어와 전문적 언어로 묘사하며, 지능, 위협성, 사회경제적 지위의 귀속에서 인종적 편향을 보인다. 편향의 심각성은 모델마다 다르게 나타나며, 일부 아키텍처는 다른 것보다 편향이 적게 나타나, 아키텍처 및 학습 방식의 선택이 중요함을 시사한다.

텍스트 생성의 성별 편향

Soundararajan과 Delany(2024)는 LLM 텍스트 생성에서의 성별 편향을 구체적으로 조사하며, 모델이 직업, 성격 특성, 사회적 역할을 성별과 어떻게 연관짓는지를 검토한다. 이들의 방법론은 성별 중립적 프롬프트에 대한 수천 개의 텍스트 완성을 생성하고, 출력물에서 성별화된 언어의 통계적 분포를 분석한다. 결과는 LLM이 특정 직업을 특정 성별과 불균형적으로 연관짓고(공학은 남성, 간호는 여성), 남성 및 여성 캐릭터에 서로 다른 성격 특성을 부여하며(주체성 대 친화성), 서사 생성에서 이성애 규범적 가정을 재현한다는 것을 확인한다. 특히, 편향은 훈련 데이터보다 모델 출력에서 더욱 극단적으로 나타나는 경우가 많은데, 이는 증폭 가설과 일치하는 발견이다. 즉, 모델은 가장 확률 높은 연속을 최적화하기 때문에 훈련 데이터보다 더 편향적이 되는데, 이러한 연속은 가장 전형적인 경향이 있다.

다중 양식에 걸친 종교적 편향

Abrar et al.(2025)은 편향 분석을 종교로 확장하여, 언어 모델과 텍스트-이미지 모델 모두가 서로 다른 종교 집단을 어떻게 표현하는지 검토한다. 이들의 체계적 연구는 모델 출력에서 종교가 묘사되는 방식에 상당한 불균형이 있음을 밝힌다. 일부 종교는 지속적으로 폭력과 극단주의와 연관되는 반면, 다른 종교는 평화와 영성과 연관되며, 이는 영어권 미디어에 존재하는 편향을 반영하고 증폭시킨다. 이 연구는 반사실적 데이터 증강 및 제약된 디코딩을 포함한 탐지 방법과 편향 완화 전략을 제시한다. 텍스트-이미지 모델이 텍스트 편향을 강화하는 시각적 고정관념(특정 종교 구성원을 전형적인 의복이나 환경으로 묘사하는 것)을 생성할 수 있다는 점에서 다중 양식적 차원은 중요하다.

비영어권 언어에서의 편향 증폭

Gupta et al.(2024)은 LLM이 영어가 아닌 힌디어로 텍스트를 생성할 때 성별 편향이 더욱 두드러진다는 것을 입증한다. 명사에 성별을 부여하고 동사 및 형용사에 성별 일치를 요구하는 힌디어의 문법적 성별 체계는 사회적 편향과 상호작용하여 강하게 성별화된 출력을 생성한다. 이 연구는 LLM이 힌디어 텍스트 생성에서 직업적 직종에는 기본적으로 남성 성별을, 가사 역할에는 여성 성별을 사용하며, 이러한 편향이 동등한 영어 생성보다 더 극단적임을 발견한다. 이 발견은 중요한 점을 부각시킨다. 주로 영어를 대상으로 수행된 편향 연구가 다른 언어, 특히 문법적 성별, 경어 체계, 또는 사회적 범주와 상호작용하는 기타 형태통사론적 특성을 가진 언어에 일반화될 수 있다고 가정해서는 안 된다.

AI 생성 언어의 편향 유형 분류

편향 유형	언어에서의 발현	탐지 방법	편향 완화 접근법
성별-직업	STEM에는 남성 대명사, 돌봄 업무에는 여성 대명사	직업적 맥락에서의 대명사 분포	반사실적 데이터 증강
인종적 귀인	서로 다른 인종 집단에 대한 상이한 특성 언어	병렬 프롬프트 분석	훈련에서의 표현 균형화
종교적 연관	특정 종교와 폭력/평화 연관	종교적 콘텐츠의 감성 분석	제약된 디코딩
문법적 성별 증폭	성별화된 언어에서의 과장된 성별 기본값	교차 언어 비교	언어별 미세 조정
교차적	다중 소수 정체성에 대한 복합적 편향	교차적 프롬프트 설계	다축 편향 완화

주목할 사항

이 분야는 편향 탐지에서 편향 완화로 이동하고 있지만, 해결책은 아직 확립되지 않았다. 한 유형의 편향(성별)에 효과적인 편향 제거 접근법이 다른 유형(인종, 종교, 장애)에는 일반화되지 않을 수 있으며, 벤치마크에서 편향을 줄이는 접근법이 실제 배포 환경에서는 편향을 줄이지 못할 수도 있다. Constitutional AI와 RLHF(reinforcement learning from human feedback)는 훈련에 공정성 제약을 통합하기 위한 프레임워크를 제공하지만, "공정한" 언어의 정의 자체가 문화와 철학적 전통에 따라 논쟁의 여지가 있다. 가장 중요한 새로운 방향은, 영향을 받는 커뮤니티의 실제 경험을 공유하지 않을 수 있는 연구자와 기술자들이 편향의 정의를 일방적으로 부과하는 방식 대신, 영향을 받는 커뮤니티가 자신들에게 가장 중요한 편향을 정의하고, 탐지하고, 완화하는 과정에 직접 참여하는 참여형 접근법일 수 있다.

ORAA ResearchBrain을 사용하여 관련 연구를 탐색하라.

References (4)

[1] Fraser, K.C. & Kiritchenko, S. (2024). Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images.

DOI Scholar

[2] Soundararajan, S. & Delany, S.J. (2024). Investigating Gender Bias in Large Language Models Through Text Generation. Proceedings of ICNLSP 2024.

Scholar

[3] Abrar, A., Oeshy, N.T., & Kabir, M. (2025). Religious Bias Landscape in Language and Text-to-Image Models: Analysis, Detection, and Debiasing Strategies. AI and Ethics.

DOI Scholar

[4] Gupta, I., Joshi, I., & Dey, A. (2024). "Since Lawyers are Males..": Examining Implicit Gender Bias in Hindi Language Generation by LLMs. Proc. ACM FAccT 2024.

DOI Scholar

Linguistic Bias in AI-Generated Text: How Language Models Encode and Amplify Stereotypes

Why It Matters

The Science

Gender and Racial Bias in Vision-Language Models

Gender Bias in Text Generation

Religious Bias Across Modalities

Bias Amplification in Non-English Languages

Bias Typology in AI-Generated Language

What To Watch

AI 생성 텍스트의 언어적 편향: 언어 모델이 고정관념을 인코딩하고 증폭시키는 방식

왜 중요한가

연구 내용

시각-언어 모델의 성별 및 인종 편향

텍스트 생성의 성별 편향

다중 양식에 걸친 종교적 편향

비영어권 언어에서의 편향 증폭

AI 생성 언어의 편향 유형 분류

주목할 사항

References (4)

Explore this topic deeper