Linguistics & NLP

LLMs as Discourse Analysts: What Social Media Mining Reveals About Public Opinion

Large language models are increasingly used to analyze public discourse on social media platforms. Studies from Chinese Weibo and Xiaohongshu reveal how LLM-assisted analysis can uncover sentiment patterns, amplification dynamics, and cultural attitudes at scaleโ€”while also encoding its own biases.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Social media platforms generate vast quantities of text that capture public attitudes, emotional reactions, and discursive patterns in near real-time. Traditional content analysisโ€”human coders reading and categorizing postsโ€”cannot keep pace with this volume. Large language models offer an alternative: automated classification of sentiment, topic, and discourse features across millions of posts. But using LLMs as analytical instruments raises methodological questions: how reliable are their classifications? What biases do they introduce? And what can they reveal that human analysis cannot?

The Research Landscape

Geopolitical Discourse and LLM Bias

Rogers and Zhang (2024), with 15 citations, provide the most methodologically rigorous study, analyzing discourse about the Russia-Ukraine war across Chinese social media platforms (Weibo and Douyin) using both manual and LLM-assisted classification.

Their most striking finding is methodological: LLM classification systematically coded more posts as "neutral" than human coders did. Posts that human analysts identified as subtly pro-Russian (through ironic framing, what-aboutism, or selective emphasis) were classified as "neutral" by the LLM. This "bias toward neutrality" reflects the LLM's training: safety fine-tuning encourages models to avoid taking sides on politically sensitive topics, creating a systematic undercount of non-neutral sentiment.

The substantive finding is equally significant: Chinese social media discourse about the war showed "mass amplification of Russian state positions"โ€”not through explicit pro-Russia statements (which are relatively rare) but through selective topic emphasis, framing effects, and narrative repetition. This amplification is largely invisible to LLM classification because it operates at the discourse level (how topics are framed) rather than the sentiment level (how individual posts are classified).

The implication for computational discourse analysis is clear: LLMs can classify individual posts but struggle with discourse-level phenomena that emerge from patterns across posts.

Food Safety Sentiment Evolution

Ma and Zheng (2024), with 12 citations, apply text mining and sentiment analysis to food safety incidents on Weibo, tracking how public sentiment evolves over time during food safety crises. Their contribution is the temporal dimension: not just what people feel, but how those feelings change as events unfold.

The analysis reveals a consistent pattern:

  • Initial shock (hours 0-24): High negative sentiment, dominated by fear and anger.
  • Blame attribution (days 1-3): Sentiment shifts from generalized anxiety to directed angerโ€”targeting specific companies, regulatory bodies, or government agencies.
  • Normalization (days 3-14): Sentiment gradually returns to baseline as media attention fades, but a residual distrust persists in subsequent discussions of related topics.
  • Reactivation: Future food safety incidents reactivate the accumulated distrust, producing stronger initial reactions than the objective severity of the new incident would predict.
  • This temporal pattern has practical implications for crisis communication: the window for effective response is narrow (the first 24-48 hours), and failure to respond during this window allows blame attribution to solidify into durable public narratives.

    Rural Landscape Sentiment

    Zhang, Jin, and Rogers & Zhang (2024), with 5 citations, demonstrate a different application: analyzing public sentiment toward rural landscapes on Weibo using deep learning models. The study goes beyond simple sentiment classification to identify specific dimensions of landscape appreciationโ€”aesthetic, ecological, nostalgic, economicโ€”and their relative prevalence in public discourse.

    The finding that "nostalgia" is the dominant sentiment dimension in rural landscape discussions (more prevalent than aesthetic appreciation or economic valuation) has implications for rural planning: public support for rural preservation may be driven more by emotional attachment to an idealized past than by ecological or economic arguments.

    Marriage Discourse

    Ye and Gao (2025) apply LLM-assisted content analysis to 219,358 marriage-related posts from Weibo and Xiaohongshu, examining how declining marriage rates in China are discussed on social media. The analysis identifies moral foundations underlying marriage discourseโ€”care, fairness, loyalty, authority, sanctityโ€”and how these foundations differ between platforms and demographic groups.

    Critical Analysis: Claims and Evidence

    <
    ClaimEvidenceVerdict
    LLMs show systematic bias toward neutrality in political discourse classificationRogers & Zhang's manual vs. automated comparisonโœ… Supported โ€” safety fine-tuning creates undercounting of non-neutral sentiment
    Social media sentiment during crises follows predictable temporal patternsMa & Zheng's food safety crisis analysisโœ… Supported โ€” consistent across multiple incidents
    Nostalgia dominates public discourse about rural landscapesZhang et al.'s sentiment dimension analysisโœ… Supported โ€” nostalgia > aesthetics > economics in Weibo data
    LLMs capture discourse-level phenomena (framing, amplification)Rogers & Zhang's analysisโŒ Refuted โ€” LLMs classify posts but miss discourse-level patterns

    Open Questions

  • The neutrality bias: If LLMs systematically undercount non-neutral sentiment, how should researchers calibrate their classifications? Human validation on representative subsamples is necessary but expensive.
  • Platform effects: Different platforms (Weibo vs. Xiaohongshu vs. Douyin) have different content moderation policies, user demographics, and algorithmic curation. How should cross-platform analyses account for these differences?
  • Multilingual discourse: Most social media discourse analysis uses monolingual models. How should code-switching (common in multilingual societies) be handled?
  • Ethics of mass analysis: Analyzing millions of social media posts without individual consent raises privacy and ethical questions, even when posts are publicly available.
  • What This Means for Your Research

    For computational linguists, Rogers and Zhang's finding about neutrality bias is methodologically important: LLM classifications should be validated against human judgments, especially for politically sensitive content.

    For crisis communication researchers, Ma and Zheng's temporal pattern provides an empirically grounded framework for response timing.

    Explore related work through ORAA ResearchBrain.

    References (4)

    [1] Rogers, R. & Zhang, X. (2024). The Russiaโ€“Ukraine War in Chinese Social Media: LLM Analysis Yields a Bias Toward Neutrality. Social Media + Society.
    [2] Ma, B. & Zheng, R. (2024). Exploring Food Safety Emergency Incidents on Sina Weibo: Using Text Mining and Sentiment Evolution. Journal of Food Protection, 100418.
    [3] Zhang, J., Jin, G., & Liu, Y. (2024). Attention and sentiment of Chinese public toward rural landscape based on Sina Weibo. Scientific Reports.
    [4] Ye, F. & Gao, X. (2025). Marriage Discourse on Chinese Social Media: An LLM-assisted Analysis. [Preprint].

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords โ†’