Linguistics & NLP

Teaching a Dying Script: Pedagogical Strategies for the Manchu Language

Manchu—once the administrative language of the Qing Dynasty—now has fewer than 20 fluent speakers. A rare empirical case study documents how a teacher develops orthographic knowledge in new learners, while NLP tools and AI offer both promise and practical limitations for documentation.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Manchu was the administrative language of one of history's largest empires. The Qing Dynasty (1644–1912) governed China through Manchu, and the imperial archives contain millions of documents in the Manchu script—a vertical alphabet adapted from Mongolian. Today, the language is critically endangered: UNESCO estimates fewer than 20 fluent speakers remain, most elderly. The Manchu script, which encodes a rich literary and administrative tradition, is at risk of becoming unreadable within a generation.

This makes Manchu a compelling case study for two intertwined questions: how do you teach a language with almost no living community of speakers, and how can technology help preserve what remains?

The Research Landscape

Pedagogy for a Language Without a Community

Li, Murphy, and Nag (2025) provide a rare empirical case study of Manchu language pedagogy, published in the International Journal of Applied Linguistics. The study uses thick description to document a teacher's strategies for developing orthographic knowledge—the ability to read and write Manchu script—in new learners who have no prior exposure to the language.

The pedagogical challenges are distinctive:

No immersion environment. Most language teaching benefits from exposure outside the classroom—media, conversation, signage. For Manchu, the classroom is the only exposure. This means the teacher must create all the contextual scaffolding that a living language community would normally provide.

A unique script. The Manchu alphabet is written vertically (top to bottom, left to right) and uses a modified Mongolian script with additional diacritical marks. For learners accustomed to horizontal scripts (Chinese, English), even basic reading requires retraining visual scanning patterns.

Pedagogical isolation. With so few fluent speakers, there is no established community of Manchu language teachers. The teacher in this study developed strategies largely independently, drawing on general principles of literacy instruction adapted to Manchu's specific properties.

The strategies documented include:

  • Phonological awareness training: Teaching learners to segment Manchu words into syllables before introducing the script, building an auditory foundation that the visual script can map onto.
  • Character-component analysis: Breaking Manchu characters into recurring components (stems, suffixes, diacritical marks) and teaching these as a generative system rather than rote-memorized forms.
  • Contextualized reading: Using historical documents (imperial edicts, personal letters) as reading material from early stages, connecting orthographic instruction to cultural motivation.

AI for Endangered Language Documentation and Teaching

Wang (2024), with 2 citations, surveys the broader role of AI in endangered language work, covering both documentation (creating records of the language) and pedagogy (teaching it to new learners). The analysis identifies several AI applications:

  • Automatic speech recognition for transcription of oral recordings—particularly valuable for languages where fluent speakers are elderly and recording time is limited.
  • Optical character recognition for digitizing handwritten manuscripts—directly relevant to Manchu, where imperial archives are largely handwritten.
  • Language learning applications that use spaced repetition and AI-driven feedback to teach vocabulary and grammar.
However, Wang notes a persistent challenge: AI tools require training data, and endangered languages by definition have very little. The catch-22 of endangered language AI—you need data to build tools, and you need tools to create data—remains largely unsolved. Wang argues for a staged approach: use human linguists to create small, high-quality datasets, then use those datasets to train AI tools that accelerate further data creation.

NLP Tools for Manchu

Lee, Byun, and Seo (2024) present the most concrete technical contribution: NLP tools for Named Entity Recognition (NER) and Part-of-Speech (POS) tagging in Manchu. Testing three architectures (BiLSTM-CRF, BERT, mBERT), they find that fine-tuned BERT outperforms both alternatives for POS tagging, while performance differences are smaller for NER.

The practical implication is that even with very limited training data (~50,000 tokens), useful NLP tools can be built for endangered languages. But "useful" needs qualification: the tools accelerate human annotation work but do not replace it. A Manchu scholar still needs to validate every output.

The Theory-Practice Gap

Gessler and von der Wense (2024), with 4 citations, provide the crucial context: despite two decades of NLP tools being built for endangered languages, most documentary work still proceeds without them. The barriers are not primarily technical but workflow-related: the tools do not integrate with the software that field linguists actually use, and the learning curve for NLP tools exceeds what most linguists are willing to invest.

Critical Analysis: Claims and Evidence

<
ClaimEvidenceVerdict
Manchu orthographic instruction requires distinctive pedagogical strategiesLi et al.'s thick description case study✅ Supported — vertical script, no immersion, isolation documented
AI can accelerate endangered language documentationWang's survey of AI applications⚠️ Uncertain — technically feasible but catch-22 of training data persists
NLP tools for Manchu are feasible with limited training dataLee et al.'s NER and POS tagging experiments✅ Supported — modest performance with ~50K tokens
NLP tools are underutilized in actual fieldworkGessler & von der Wense's analysis✅ Supported — workflow integration is the bottleneck

Open Questions

  • Intergenerational transmission: Can classroom instruction replace community transmission? Or does language revitalization ultimately require rebuilding a speech community?
  • Script vs. language: Is preserving the Manchu script (as a written system that can be read) sufficient if the spoken language disappears? What is lost when a language becomes read-only?
  • Ethical questions: Who should learn Manchu—anyone interested, or only Manchu-descended communities? Who governs how the language is taught and represented?
  • Scalability: The pedagogical strategies documented by Li et al. are intensive and personalized. Can they be scaled through technology?
  • What This Means for Your Research

    For applied linguists, the Manchu case study demonstrates that endangered language pedagogy is not simply "language teaching with fewer resources." It requires fundamentally different strategies when there is no community of speakers to provide immersion.

    For NLP researchers, Manchu represents both a challenge and an opportunity: extremely low resources, but a relatively well-documented writing system with extensive historical archives.

    Explore related work through ORAA ResearchBrain.

    References (4)

    [1] Li, B., Murphy, V.A., & Nag, S. (2025). Exploring Pedagogical Strategies for Developing Orthographic Knowledge: A Case Study of the Critically Endangered Manchu Language. International Journal of Applied Linguistics.
    [2] Wang, L. (2024). Artificial intelligence's role in the realm of endangered languages: Documentation and teaching. Applied and Computational Engineering, 48.
    [3] Lee, S., Byun, G., & Seo, J. (2024). ManNER & ManPOS: Pioneering NLP for Endangered Manchu Language.
    [4] Gessler, L. & von der Wense, K. (2024). NLP for Language Documentation: Two Reasons for the Gap between Theory and Practice. Proc. AmericasNLP 2024.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 7 keywords →