Methodology GuideLinguistics & NLPSystematic Review

Aligning Babel: A Taxonomy of How LLMs Achieve Multilingual Competence

A comprehensive survey proposes a taxonomy of alignment strategies that enable large language models to achieve cross-lingual competence—revealing that multilingual capability is not a single phenomenon but a constellation of distinct engineering and linguistic challenges.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Ask GPT-4 a question in Finnish and it responds fluently. Ask it the same question in Yoruba and the quality degrades noticeably. This asymmetry is not a minor inconvenience—it reflects a structural challenge that sits at the intersection of computational linguistics, data economics, and language policy. How do large language models achieve competence across multiple languages, and why does that competence distribute so unevenly across the world's linguistic diversity?

The Research Landscape

A survey published in Patterns (Cell Press, 2025) takes on this question by proposing a taxonomy of alignment strategies for multilingual LLMs. Rather than treating multilingualism as a single capability that models either have or lack, the survey maps the distinct strategies through which cross-lingual competence is engineered, revealing a landscape that is far more structured—and more contested—than casual observers might assume.

The survey's taxonomic approach is its core intellectual contribution. "Alignment" in the multilingual context refers to the mechanisms by which a model's internal representations are brought into correspondence across languages—so that the concept of "democracy" in English, "democracia" in Spanish, and "민주주의" in Korean activate overlapping representational spaces. This alignment can occur at multiple stages: during pretraining (through multilingual corpora), during fine-tuning (through cross-lingual instruction data), through architectural choices (shared vs. language-specific parameters), or through post-hoc techniques (translation-based augmentation, cross-lingual retrieval).

By organizing these approaches into a taxonomy, the survey enables researchers to see which alignment strategies have been most studied, which have been most effective, and—perhaps most importantly—which combinations remain unexplored. This is the kind of contribution that accelerates a field by making its own structure visible to itself.

The survey also confronts a challenge that is as much sociolinguistic as it is technical: the performance gap between high-resource languages (English, Chinese, Spanish) and low-resource languages (most of the world's languages). This gap is not merely a data problem—it reflects historical patterns of digital inclusion, economic power, and the political economies of language technology development. Any taxonomy of multilingual alignment strategies must grapple with whether a given strategy narrows or widens this gap.

Critical Analysis

Claim	Evidence Basis	Verdict
A taxonomy of alignment strategies can organize multilingual LLM research	Survey proposes such a taxonomy based on comprehensive literature review	✅ Supported
Cross-lingual capability involves distinct engineering challenges	Multiple alignment strategies identified, operating at different stages	✅ Supported
Maintaining performance across diverse languages is a core challenge	Identified as a key finding of the survey	✅ Supported
The proposed taxonomy is comprehensive	Comprehensiveness depends on search methodology and temporal coverage	⚠️ Likely comprehensive at time of writing, but field evolves rapidly

Several aspects of this survey merit careful consideration. First, taxonomies are analytical tools, not natural kinds. The categories the survey proposes are choices—they emphasize certain distinctions and blur others. Alternative taxonomies might organize the same literature differently, and the most productive use of this taxonomy may be as a starting point for debate rather than a final word.

Second, the relationship between alignment strategy and downstream performance is not straightforward. A strategy that produces strong cross-lingual alignment on benchmarks may fail on specific language pairs or specific tasks. The survey's value depends partly on whether it addresses this gap between alignment quality and task performance.

Third, there is a tension between two goals that multilingual LLM research pursues simultaneously: cross-lingual transfer (using knowledge from high-resource languages to improve performance on low-resource languages) and language-specific fidelity (respecting the unique syntactic, morphological, and pragmatic features of each language). These goals can conflict. Alignment strategies optimized for cross-lingual transfer may impose English-centric structural assumptions on languages with very different typological properties. The survey's treatment of this tension would be essential reading for anyone working on low-resource language technology.

Open Questions

Typological coverage: Does the taxonomy account for typological diversity—agglutinative languages like Turkish, tonal languages like Mandarin, polysynthetic languages like Inuktitut—or does it implicitly assume a European language structure?
Evaluation metrics: How should multilingual competence be measured? BLEU scores and accuracy on translated benchmarks may not capture whether a model truly "understands" a language or merely performs pattern matching on translated inputs.
Scaling dynamics: Do alignment strategies that work for models with tens of billions of parameters also work at smaller scales? This matters for deployment in resource-constrained environments where many low-resource language communities operate.
Cultural alignment: Beyond linguistic alignment, does the survey address cultural alignment—the question of whether a model's responses are culturally appropriate in addition to being linguistically correct?
Temporal stability: As new languages and dialects gain digital presence, how robust are current alignment strategies to the continuous evolution of the linguistic landscape?

Closing

The deepest insight this survey offers may be methodological rather than empirical: that multilingual capability is not a single problem but a family of related problems, each requiring different solutions and different evaluation criteria. For NLP researchers, the taxonomy provides a map of where the field has been and where the gaps lie. For language communities whose languages remain underserved by current models, the taxonomy may reveal which alignment strategies hold the most promise for closing the performance gap—and which risk entrenching it. The full survey merits close reading by anyone building or evaluating multilingual systems.

Explore related work through ORAA ResearchBrain.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 구체적인 연구 결과, 통계 및 주장은 학술 연구에서 인용하기 전에 원문 논문을 통해 검증해야 한다.

바벨의 정렬: LLM이 다국어 능력을 달성하는 방식의 분류 체계

GPT-4에게 핀란드어로 질문을 하면 유창하게 답변한다. 같은 질문을 요루바어로 하면 품질이 눈에 띄게 저하된다. 이러한 비대칭성은 사소한 불편이 아니라, 계산 언어학, 데이터 경제학, 언어 정책이 교차하는 지점에 놓인 구조적 과제를 반영한다. 대형 언어 모델(LLM)은 어떻게 여러 언어에 걸쳐 능력을 달성하며, 왜 그 능력은 세계의 언어적 다양성에 걸쳐 이토록 불균등하게 분포하는가?

연구 현황

Patterns (Cell Press, 2025)에 발표된 한 서베이 논문은 다국어 LLM의 정렬 전략에 대한 분류 체계를 제안함으로써 이 질문에 답하고자 한다. 이 서베이는 다국어성을 모델이 보유하거나 결여한 단일 능력으로 취급하는 대신, 교차 언어적 능력이 구현되는 뚜렷한 전략들을 지도화함으로써, 일반 관찰자들이 상정할 수 있는 것보다 훨씬 더 구조화되어 있고 논쟁의 여지가 있는 연구 지형을 드러낸다.

서베이의 분류 체계적 접근 방식이 핵심적인 학문적 기여이다. 다국어 맥락에서 "정렬"이란 모델의 내부 표현이 언어들에 걸쳐 대응 관계를 갖도록 하는 메커니즘을 의미한다. 즉, 영어의 "democracy", 스페인어의 "democracia", 한국어의 "민주주의"라는 개념이 중첩되는 표현 공간을 활성화하도록 하는 것이다. 이러한 정렬은 여러 단계에서 발생할 수 있다: 사전 학습 단계(다국어 코퍼스를 통해), 미세 조정 단계(교차 언어적 명령 데이터를 통해), 아키텍처 선택(공유 매개변수 대 언어별 매개변수), 또는 사후적 기법(번역 기반 증강, 교차 언어적 검색)을 통해서이다.

이러한 접근 방식들을 분류 체계로 정리함으로써, 서베이는 연구자들이 어떤 정렬 전략이 가장 많이 연구되었는지, 어떤 전략이 가장 효과적이었는지, 그리고—아마도 가장 중요하게는—어떤 조합이 아직 탐구되지 않았는지를 파악할 수 있게 한다. 이는 분야 자체의 구조를 스스로 가시화함으로써 해당 분야의 발전을 가속화하는 종류의 기여이다.

또한 이 서베이는 기술적인 문제이기만큼이나 사회언어학적이기도 한 과제를 정면으로 다룬다: 고자원 언어(영어, 중국어, 스페인어)와 저자원 언어(세계 대부분의 언어) 사이의 성능 격차이다. 이 격차는 단순한 데이터 문제가 아니라, 디지털 포용, 경제적 권력, 언어 기술 개발의 정치경제학의 역사적 패턴을 반영한다. 다국어 정렬 전략의 분류 체계는 주어진 전략이 이 격차를 좁히는지 넓히는지를 반드시 다루어야 한다.

비판적 분석

주장	근거	판정
정렬 전략의 분류 체계가 다국어 LLM 연구를 체계화할 수 있다	서베이가 포괄적인 문헌 검토를 바탕으로 그러한 분류 체계를 제안함	✅ 지지됨
교차 언어적 능력은 뚜렷한 공학적 과제를 수반한다	서로 다른 단계에서 작동하는 다수의 정렬 전략이 식별됨	✅ 지지됨
다양한 언어에 걸쳐 성능을 유지하는 것이 핵심 과제이다	서베이의 주요 연구 결과로 제시됨	✅ 지지됨
제안된 분류 체계가 포괄적이다	포괄성은 검색 방법론 및 시간적 범위에 따라 달라짐	⚠️ 작성 시점에는 포괄적일 가능성이 높으나, 분야가 빠르게 진화함

이 서베이의 몇 가지 측면은 신중한 검토를 요한다. 첫째, 분류 체계는 분석 도구이지 자연 종류(natural kinds)가 아니다. 서베이가 제안하는 범주들은 선택의 결과물로서, 특정 구분을 강조하고 다른 구분을 희석시킨다. 대안적인 분류 체계는 동일한 문헌을 다르게 조직화할 수 있으며, 이 분류 체계의 가장 생산적인 활용은 최종적인 결론이 아니라 논쟁의 출발점으로 삼는 것일 수 있다. 둘째, 정렬 전략과 다운스트림 성능 간의 관계는 단순하지 않다. 벤치마크에서 강력한 교차언어 정렬을 산출하는 전략이 특정 언어 쌍이나 특정 과제에서는 실패할 수 있다. 이 서베이의 가치는 정렬 품질과 과제 성능 사이의 이러한 격차를 다루는지 여부에 부분적으로 달려 있다.

셋째, 다국어 LLM 연구가 동시에 추구하는 두 가지 목표 사이에는 긴장 관계가 존재한다. 하나는 교차언어 전이(고자원 언어의 지식을 활용하여 저자원 언어의 성능을 향상시키는 것)이고, 다른 하나는 언어 특화 충실성(각 언어의 고유한 통사적·형태적·화용적 특성을 존중하는 것)이다. 이 두 목표는 충돌할 수 있다. 교차언어 전이에 최적화된 정렬 전략은 유형론적 특성이 매우 다른 언어들에 영어 중심의 구조적 가정을 강요할 수 있다. 이러한 긴장 관계에 대한 서베이의 논의는 저자원 언어 기술을 연구하는 모든 사람에게 필독 내용이 될 것이다.

미해결 과제

유형론적 포괄성: 분류 체계가 터키어와 같은 교착어, 표준중국어와 같은 성조 언어, 이누크티투트어와 같은 포합어 등 유형론적 다양성을 반영하는가, 아니면 암묵적으로 유럽어 구조를 전제하는가?
평가 지표: 다국어 능력은 어떻게 측정해야 하는가? BLEU 점수와 번역된 벤치마크에서의 정확도는 모델이 언어를 진정으로 "이해"하는지, 아니면 단순히 번역된 입력에 대해 패턴 매칭을 수행하는지를 포착하지 못할 수 있다.
스케일링 동역학: 수백억 개의 매개변수를 가진 모델에서 작동하는 정렬 전략이 더 작은 규모에서도 작동하는가? 이는 많은 저자원 언어 공동체가 운영되는 자원 제한 환경에서의 배포에 중요한 문제이다.
문화적 정렬: 언어적 정렬을 넘어, 서베이는 문화적 정렬, 즉 모델의 응답이 언어적으로 정확할 뿐만 아니라 문화적으로도 적절한지에 관한 문제를 다루는가?
시간적 안정성: 새로운 언어와 방언이 디지털 공간에서 존재감을 갖추어 가면서, 현재의 정렬 전략은 언어 환경의 지속적인 변화에 얼마나 견고한가?

마치며

이 서베이가 제공하는 가장 깊은 통찰은 경험적인 것이 아니라 방법론적인 것일 수 있다. 즉, 다국어 능력은 단일한 문제가 아니라 각기 다른 해결책과 다른 평가 기준을 필요로 하는 관련 문제들의 집합이라는 점이다. NLP 연구자들에게 이 분류 체계는 해당 분야가 걸어온 길과 공백이 어디에 있는지를 보여주는 지도를 제공한다. 현재의 모델에서 충분히 지원받지 못하고 있는 언어 공동체에게는, 이 분류 체계가 성능 격차를 좁히는 데 가장 유망한 정렬 전략이 무엇인지, 그리고 어떤 전략이 오히려 그 격차를 고착화할 위험이 있는지를 드러내 줄 수 있다. 이 서베이 전문은 다국어 시스템을 구축하거나 평가하는 모든 사람이 꼼꼼히 읽을 가치가 있다.

관련 연구는 ORAA ResearchBrain을 통해 탐색할 수 있다.

References (2)

[1] Survey of multilingual large language models: alignment strategies for cross-lingual capabilities. (2025). Patterns.

DOI Scholar

Cell Patterns (2025). Survey of multilingual large language models: alignment strategies for cross-lingual capabilities.