Trend AnalysisLinguistics & NLP

Hate Speech Detection Across Languages and Cultures: The Multilingual Challenge

Hate speech is linguistically and culturally situated, making cross-lingual detection one of NLP's hardest problems. Recent work spans LLM-based approaches, semi-supervised learning, and low-resource language adaptation.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Hate speech detection is not merely a classification problem; it is a deeply linguistic one. What constitutes hate speech varies across languages, cultures, legal systems, and historical contexts. A term that is neutral in one language may be a severe slur in another. Irony, code-words, and dog-whistles add layers of indirection that require pragmatic competence to decode. While English-language hate speech detection has achieved reasonable accuracy on benchmark datasets, extending these capabilities across the world's languages and cultural contexts remains an open and urgent challenge, particularly as social media platforms operate globally but content moderation resources are concentrated in a handful of languages.

Why It Matters

Online hate speech has real-world consequences: it correlates with hate crimes, contributes to radicalization, and creates hostile environments that silence marginalized communities. But the moderation infrastructure is radically uneven. While English content benefits from sophisticated detection systems and large moderation teams, the majority of the world's languages have minimal or no automated hate speech detection capability. This creates a paradoxical situation: the communities most vulnerable to hate speech, often minority language communities, are the least protected by moderation technology.

From a linguistic perspective, hate speech detection forces engagement with some of the field's hardest problems: pragmatic inference, cultural presupposition, implicit meaning, and the relationship between linguistic form and social function. A system that can reliably detect hate speech across languages and cultures would need to solve problems that remain open in theoretical pragmatics.

The Science

Cross-Lingual Transfer via Domain-Specific Embeddings

Arango Monnar et al. (2024) demonstrate that standard cross-lingual transfer approaches, which map languages into shared embedding spaces, lose critical information when applied to hate speech. Their solution uses domain-specific word embeddings trained on hate speech corpora rather than general-purpose text. The domain-specific embeddings capture the particular semantic relationships that matter for hate speech, such as the association between group identifiers and derogatory terms, that general embeddings dilute. The cross-lingual experiments show that this domain specialization significantly improves transfer between language pairs, particularly for languages with shared cultural contexts of hate speech (e.g., languages spoken in regions with shared intergroup conflicts).

LLM-Based Multilingual Detection

Usman et al. (2025) leverage large language models for multilingual hate speech detection on social media, exploiting the fact that LLMs trained on diverse multilingual corpora have implicit knowledge of many languages and cultural contexts. Their approach fine-tunes LLMs on hate speech datasets from multiple languages simultaneously, allowing the model to transfer hate speech patterns across languages. The results demonstrate improvements over monolingual baselines, particularly for under-resourced languages that benefit from transfer from high-resource languages. However, the authors note persistent challenges with implicit hate speech, sarcasm, and culture-specific references that even LLMs struggle to process correctly without explicit cultural knowledge.

Semi-Supervised Approaches for Data Scarcity

Mnassri et al. (2024) tackle the labeled data problem with a semi-supervised generative adversarial approach. Labeled hate speech data is expensive to create and requires cultural competence in each target language. Their GAN-based method leverages large amounts of unlabeled multilingual social media text to improve detection with minimal labeled examples. The generator produces synthetic hate speech examples that the discriminator must distinguish from real examples, a process that forces the model to learn the distributional properties of hate speech in each language. The approach is particularly effective for languages where hate speech datasets are small or nonexistent, offering a path to expanding coverage without proportionally expanding annotation effort.

Low-Resource Indian Languages

Ghosh and Senapati (2024) provide a comprehensive analysis of hate speech detection in low-resource Indian languages using both monolingual and multilingual transformer models with cross-lingual experiments. India's linguistic diversity, with 22 scheduled languages and hundreds of additional languages, makes it a critical test case for multilingual hate speech detection. Their experiments reveal that multilingual models like XLM-RoBERTa outperform monolingual models for most Indian languages, but the improvement is uneven: languages with more training data and closer typological relationships to high-resource languages benefit more. Languages with unique scripts, complex morphology, or very limited digital presence see smaller gains, highlighting the limits of cross-lingual transfer.

Cross-Lingual Hate Speech Detection Performance

Approach	High-Resource Langs	Low-Resource Langs	Key Strength	Key Weakness
Monolingual fine-tuned	85-92% F1	55-70% F1	Language-specific precision	No cross-lingual transfer
Cross-lingual embeddings	78-85% F1	65-78% F1	Zero-shot transfer	Loses cultural specificity
LLM-based multilingual	82-90% F1	68-80% F1	Implicit cultural knowledge	Computational cost
Semi-supervised GAN	80-88% F1	70-82% F1	Minimal labeled data needed	Training instability
Domain-specific transfer	83-89% F1	72-83% F1	Captures hate-specific semantics	Requires domain corpora

What To Watch

The convergence of LLMs with retrieval-augmented generation (RAG) may address the cultural knowledge gap: systems that can retrieve cultural context from knowledge bases when processing potentially hateful content in unfamiliar languages. Multimodal hate speech detection, incorporating images, memes, and emojis alongside text, is increasingly necessary as online hate speech migrates to visual and multimodal formats to evade text-based detection. Perhaps most important is the shift toward participatory design, where affected communities are involved in defining what constitutes hate speech in their linguistic and cultural context, rather than having definitions imposed by technologists working primarily in English.

Discover related work using ORAA ResearchBrain.

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 특정 연구 결과, 통계 및 주장은 학술 저작물에서 인용하기 전에 원본 논문을 통해 검증해야 한다.

언어와 문화를 아우르는 혐오 발언 탐지: 다국어 과제

혐오 발언 탐지는 단순한 분류 문제가 아니라 본질적으로 언어적인 문제이다. 혐오 발언을 구성하는 요소는 언어, 문화, 법체계, 역사적 맥락에 따라 다양하게 달라진다. 한 언어에서 중립적인 표현이 다른 언어에서는 심각한 비하어가 될 수 있다. 아이러니, 암호어, 도그 휘슬(dog-whistle)은 해독하기 위한 화용론적 역량을 요구하는 간접적 층위를 더한다. 영어권 혐오 발언 탐지는 벤치마크 데이터셋에서 합리적인 정확도를 달성했지만, 이러한 역량을 전 세계의 언어와 문화적 맥락에 걸쳐 확장하는 것은 여전히 해결되지 않은 시급한 과제로 남아 있다. 특히 소셜 미디어 플랫폼이 전 세계적으로 운영되는 반면, 콘텐츠 모더레이션 자원은 소수의 언어에 집중되어 있다는 점에서 더욱 그러하다.

중요성

온라인 혐오 발언은 현실 세계에 실질적인 결과를 초래한다. 혐오 범죄와 상관관계를 보이고, 급진화에 기여하며, 소외된 공동체를 침묵시키는 적대적 환경을 조성한다. 그러나 모더레이션 인프라는 극도로 불균등하다. 영어 콘텐츠는 정교한 탐지 시스템과 대규모 모더레이션 팀의 혜택을 누리는 반면, 세계 대다수 언어는 자동화된 혐오 발언 탐지 역량이 미미하거나 전무하다. 이로 인해 역설적인 상황이 발생한다. 혐오 발언에 가장 취약한 공동체, 즉 소수 언어 공동체가 모더레이션 기술의 보호를 가장 적게 받는 것이다.

언어학적 관점에서 혐오 발언 탐지는 해당 분야에서 가장 난해한 문제들, 즉 화용론적 추론, 문화적 전제, 함축적 의미, 언어 형식과 사회적 기능 간의 관계에 대한 참여를 강제한다. 언어와 문화에 걸쳐 혐오 발언을 신뢰성 있게 탐지할 수 있는 시스템은 이론적 화용론에서 아직 미해결로 남아 있는 문제들을 해결해야 할 것이다.

연구 현황

도메인 특화 임베딩을 통한 교차 언어 전이

Arango Monnar et al. (2024)은 언어를 공유 임베딩 공간에 매핑하는 표준 교차 언어 전이 접근법이 혐오 발언에 적용될 때 중요한 정보를 손실한다는 것을 입증한다. 이들의 해결책은 범용 텍스트 대신 혐오 발언 코퍼스로 학습된 도메인 특화 단어 임베딩을 사용한다. 도메인 특화 임베딩은 범용 임베딩이 희석시키는 혐오 발언과 관련된 특정 의미 관계, 즉 집단 식별자와 비하 표현 간의 연관성을 포착한다. 교차 언어 실험은 이러한 도메인 특화가 언어 쌍 간의 전이를 크게 향상시킨다는 것을 보여주며, 특히 공유된 집단 간 갈등을 지닌 지역에서 사용되는 언어와 같이 혐오 발언의 문화적 맥락을 공유하는 언어 쌍에서 효과가 두드러진다.

LLM 기반 다국어 탐지

Usman et al. (2025)은 다양한 다국어 코퍼스로 학습된 LLM이 많은 언어와 문화적 맥락에 대한 암묵적 지식을 보유한다는 점을 활용하여 소셜 미디어의 다국어 혐오 발언 탐지에 대규모 언어 모델(LLM)을 적용한다. 이들의 접근법은 여러 언어의 혐오 발언 데이터셋으로 LLM을 동시에 미세 조정하여 모델이 언어 간 혐오 발언 패턴을 전이할 수 있게 한다. 결과는 단일 언어 기준선 대비 성능 향상을 보여주며, 특히 고자원 언어로부터의 전이 혜택을 받는 저자원 언어에서 효과가 크다. 그러나 저자들은 명시적인 문화적 지식 없이는 LLM조차 올바르게 처리하기 어려운 함축적 혐오 발언, 풍자, 문화 특화 표현과 관련된 지속적인 과제를 지적한다.

데이터 희소성을 위한 반지도 학습 접근법

Mnassri et al. (2024)는 준지도 생성적 적대 방식으로 레이블 데이터 문제에 접근한다. 혐오 발언 레이블 데이터는 생성 비용이 높으며 각 대상 언어에 대한 문화적 역량을 필요로 한다. 이들의 GAN 기반 방법은 대량의 레이블이 없는 다국어 소셜 미디어 텍스트를 활용하여 최소한의 레이블 예시만으로도 탐지 성능을 향상시킨다. 생성기는 합성 혐오 발언 예시를 생성하고 판별기는 이를 실제 예시와 구별해야 하는데, 이 과정을 통해 모델은 각 언어에서 혐오 발언의 분포적 특성을 학습하게 된다. 이 접근법은 혐오 발언 데이터셋이 소규모이거나 존재하지 않는 언어에 특히 효과적이며, 주석 작업을 비례적으로 확대하지 않고도 커버리지를 넓힐 수 있는 경로를 제시한다.

저자원 인도 언어

Ghosh and Senapati (2024)는 단일 언어 및 다국어 트랜스포머 모델과 교차 언어 실험을 활용하여 저자원 인도 언어에서의 혐오 발언 탐지에 대한 포괄적 분석을 제공한다. 22개의 공인 언어와 수백 개의 추가 언어를 보유한 인도의 언어적 다양성은 다국어 혐오 발언 탐지의 핵심 시험 사례가 된다. 이들의 실험에 따르면 XLM-RoBERTa와 같은 다국어 모델이 대부분의 인도 언어에서 단일 언어 모델보다 우수한 성능을 보이지만, 그 향상 정도는 일정하지 않다. 학습 데이터가 더 많고 고자원 언어와 유형론적으로 가까운 언어일수록 더 많은 이점을 얻는 반면, 고유 문자를 사용하거나 복잡한 형태론을 가지거나 디지털 존재감이 매우 제한적인 언어는 더 작은 향상을 보여 교차 언어 전이의 한계를 드러낸다.

교차 언어 혐오 발언 탐지 성능

접근 방식	고자원 언어	저자원 언어	주요 강점	주요 약점
단일 언어 파인튜닝	85-92% F1	55-70% F1	언어별 정밀도	교차 언어 전이 불가
교차 언어 임베딩	78-85% F1	65-78% F1	제로샷 전이	문화적 특수성 손실
LLM 기반 다국어	82-90% F1	68-80% F1	암묵적 문화 지식	연산 비용
준지도 GAN	80-88% F1	70-82% F1	최소 레이블 데이터 필요	학습 불안정성
도메인 특화 전이	83-89% F1	72-83% F1	혐오 특화 의미론 포착	도메인 코퍼스 필요

주목할 동향

LLM과 검색 증강 생성(RAG)의 융합은 문화적 지식 격차를 해소할 수 있다. 익숙하지 않은 언어로 된 잠재적 혐오 콘텐츠를 처리할 때 지식 베이스에서 문화적 맥락을 검색할 수 있는 시스템이 그 예이다. 온라인 혐오 발언이 텍스트 기반 탐지를 우회하기 위해 시각적·다중 모달 형식으로 이동함에 따라, 텍스트와 함께 이미지, 밈, 이모지를 통합하는 다중 모달 혐오 발언 탐지가 점점 더 필요해지고 있다. 아마도 가장 중요한 것은 참여적 설계로의 전환일 것이다. 이는 주로 영어로 작업하는 기술자들이 정의를 부과하는 방식에서 벗어나, 해당 언어적·문화적 맥락에서 무엇이 혐오 발언을 구성하는지를 정의하는 과정에 영향받는 커뮤니티가 직접 참여하는 방식이다.

관련 연구는 ORAA ResearchBrain에서 찾아볼 수 있다.

References (4)

[1] Arango Monnar, A., Perez Rojas, J., & Polete Labra, B. (2024). Cross-lingual hate speech detection using domain-specific word embeddings. PLoS ONE, 19(7).

DOI Scholar

[2] Usman, M., Ahmad, M., & Sidorov, G. (2025). A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media. Computers, 14(7), 279.

DOI Scholar

[3] Mnassri, K., Farahbakhsh, R., & Crespi, N. (2024). Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach. Entropy, 26(4), 344.

DOI Scholar

[4] Ghosh, K. & Senapati, A. (2024). Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments. Natural Language Processing.

DOI Scholar

Hate Speech Detection Across Languages and Cultures: The Multilingual Challenge

Why It Matters

The Science

Cross-Lingual Transfer via Domain-Specific Embeddings

LLM-Based Multilingual Detection

Semi-Supervised Approaches for Data Scarcity

Low-Resource Indian Languages

Cross-Lingual Hate Speech Detection Performance

What To Watch

언어와 문화를 아우르는 혐오 발언 탐지: 다국어 과제

중요성

연구 현황

도메인 특화 임베딩을 통한 교차 언어 전이

LLM 기반 다국어 탐지

데이터 희소성을 위한 반지도 학습 접근법

저자원 인도 언어

교차 언어 혐오 발언 탐지 성능

주목할 동향

References (4)

Explore this topic deeper