Of the roughly 6,000 languages spoken worldwide, large language models perform well in only about 20. Three recent papers attack this digital divide from different angles: comprehensive benchmarking across 64 African languages, language identification spanning 1,665 languages, and tokenizer optimization for 22 Indian languages. Together, they reveal how deep the gap truly is and where the most promising interventions lie.
low-resource NLPmultilingual LLMAfrican languages