Linguistics & NLP

LLMs as Discourse Analysts: What Social Media Mining Reveals About Public Opinion

Large language models are increasingly used to analyze public discourse on social media platforms. Studies from Chinese Weibo and Xiaohongshu reveal how LLM-assisted analysis can uncover sentiment patterns, amplification dynamics, and cultural attitudes at scale—while also encoding its own biases.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Social media platforms generate vast quantities of text that capture public attitudes, emotional reactions, and discursive patterns in near real-time. Traditional content analysis—human coders reading and categorizing posts—cannot keep pace with this volume. Large language models offer an alternative: automated classification of sentiment, topic, and discourse features across millions of posts. But using LLMs as analytical instruments raises methodological questions: how reliable are their classifications? What biases do they introduce? And what can they reveal that human analysis cannot?

The Research Landscape

Geopolitical Discourse and LLM Bias

Rogers and Zhang (2024), with 15 citations, provide the most methodologically rigorous study, analyzing discourse about the Russia-Ukraine war across Chinese social media platforms (Weibo and Douyin) using both manual and LLM-assisted classification.

Their most striking finding is methodological: LLM classification systematically coded more posts as "neutral" than human coders did. Posts that human analysts identified as subtly pro-Russian (through ironic framing, what-aboutism, or selective emphasis) were classified as "neutral" by the LLM. This "bias toward neutrality" reflects the LLM's training: safety fine-tuning encourages models to avoid taking sides on politically sensitive topics, creating a systematic undercount of non-neutral sentiment.

The substantive finding is equally significant: Chinese social media discourse about the war showed "mass amplification of Russian state positions"—not through explicit pro-Russia statements (which are relatively rare) but through selective topic emphasis, framing effects, and narrative repetition. This amplification is largely invisible to LLM classification because it operates at the discourse level (how topics are framed) rather than the sentiment level (how individual posts are classified).

The implication for computational discourse analysis is clear: LLMs can classify individual posts but struggle with discourse-level phenomena that emerge from patterns across posts.

Food Safety Sentiment Evolution

Ma and Zheng (2024), with 12 citations, apply text mining and sentiment analysis to food safety incidents on Weibo, tracking how public sentiment evolves over time during food safety crises. Their contribution is the temporal dimension: not just what people feel, but how those feelings change as events unfold.

The analysis reveals a consistent pattern:

Initial shock (hours 0-24): High negative sentiment, dominated by fear and anger.

Blame attribution (days 1-3): Sentiment shifts from generalized anxiety to directed anger—targeting specific companies, regulatory bodies, or government agencies.

Normalization (days 3-14): Sentiment gradually returns to baseline as media attention fades, but a residual distrust persists in subsequent discussions of related topics.

Reactivation: Future food safety incidents reactivate the accumulated distrust, producing stronger initial reactions than the objective severity of the new incident would predict.

This temporal pattern has practical implications for crisis communication: the window for effective response is narrow (the first 24-48 hours), and failure to respond during this window allows blame attribution to solidify into durable public narratives.

Rural Landscape Sentiment

Zhang, Jin, and Rogers & Zhang (2024), with 5 citations, demonstrate a different application: analyzing public sentiment toward rural landscapes on Weibo using deep learning models. The study goes beyond simple sentiment classification to identify specific dimensions of landscape appreciation—aesthetic, ecological, nostalgic, economic—and their relative prevalence in public discourse.

The finding that "nostalgia" is the dominant sentiment dimension in rural landscape discussions (more prevalent than aesthetic appreciation or economic valuation) has implications for rural planning: public support for rural preservation may be driven more by emotional attachment to an idealized past than by ecological or economic arguments.

Marriage Discourse

Ye and Gao (2025) apply LLM-assisted content analysis to 219,358 marriage-related posts from Weibo and Xiaohongshu, examining how declining marriage rates in China are discussed on social media. The analysis identifies moral foundations underlying marriage discourse—care, fairness, loyalty, authority, sanctity—and how these foundations differ between platforms and demographic groups.

Critical Analysis: Claims and Evidence

Claim	Evidence	Verdict
LLMs show systematic bias toward neutrality in political discourse classification	Rogers & Zhang's manual vs. automated comparison	✅ Supported — safety fine-tuning creates undercounting of non-neutral sentiment
Social media sentiment during crises follows predictable temporal patterns	Ma & Zheng's food safety crisis analysis	✅ Supported — consistent across multiple incidents
Nostalgia dominates public discourse about rural landscapes	Zhang et al.'s sentiment dimension analysis	✅ Supported — nostalgia > aesthetics > economics in Weibo data
LLMs capture discourse-level phenomena (framing, amplification)	Rogers & Zhang's analysis	❌ Refuted — LLMs classify posts but miss discourse-level patterns

Open Questions

The neutrality bias: If LLMs systematically undercount non-neutral sentiment, how should researchers calibrate their classifications? Human validation on representative subsamples is necessary but expensive.

Platform effects: Different platforms (Weibo vs. Xiaohongshu vs. Douyin) have different content moderation policies, user demographics, and algorithmic curation. How should cross-platform analyses account for these differences?

Multilingual discourse: Most social media discourse analysis uses monolingual models. How should code-switching (common in multilingual societies) be handled?

Ethics of mass analysis: Analyzing millions of social media posts without individual consent raises privacy and ethical questions, even when posts are publicly available.

What This Means for Your Research

For computational linguists, Rogers and Zhang's finding about neutrality bias is methodologically important: LLM classifications should be validated against human judgments, especially for politically sensitive content.

For crisis communication researchers, Ma and Zheng's temporal pattern provides an empirically grounded framework for response timing.

Explore related work through ORAA ResearchBrain.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문과 대조하여 확인해야 한다.

담론 분석가로서의 LLM: 소셜 미디어 마이닝이 여론에 대해 밝히는 것

소셜 미디어 플랫폼은 공공의 태도, 감정적 반응, 담론적 패턴을 거의 실시간으로 포착하는 방대한 양의 텍스트를 생성한다. 전통적인 내용 분석—사람이 직접 게시물을 읽고 분류하는 방식—은 이 규모에 발맞출 수 없다. 대규모 언어 모델(LLM)은 대안을 제시한다. 수백만 건의 게시물에 걸쳐 감성, 주제, 담론적 특징을 자동으로 분류하는 것이다. 그러나 LLM을 분석 도구로 활용하는 것은 방법론적 질문을 제기한다. 분류의 신뢰도는 어느 정도인가? 어떤 편향을 도입하는가? 그리고 인간 분석으로는 밝힐 수 없는 것을 무엇을 드러낼 수 있는가?

연구 동향

지정학적 담론과 LLM 편향

Rogers and Zhang (2024)은 15회 인용으로, 방법론적으로 가장 엄밀한 연구를 제공하며, 수동 분류와 LLM 보조 분류를 모두 활용하여 중국 소셜 미디어 플랫폼(Weibo와 Douyin)에서 러시아-우크라이나 전쟁에 관한 담론을 분석한다.

이 연구에서 가장 주목할 만한 발견은 방법론적인 것이다. LLM 분류는 인간 코더보다 더 많은 게시물을 "중립"으로 체계적으로 코딩하였다. 인간 분석가가 미묘하게 친러시아적이라고 식별한 게시물(아이러니한 프레이밍, 화제 전환식 논리(what-aboutism), 또는 선택적 강조를 통해)을 LLM은 "중립"으로 분류하였다. 이러한 "중립 편향"은 LLM의 학습 방식에서 비롯된다. 안전 미세조정(safety fine-tuning)은 모델이 정치적으로 민감한 주제에 대해 입장 표명을 피하도록 유도하며, 이로 인해 비중립적 감성이 체계적으로 과소 계산된다.

실질적인 발견 또한 그에 못지않게 중요하다. 전쟁에 관한 중국 소셜 미디어 담론은 "러시아 국가 입장의 대규모 증폭"을 보여주었다. 이는 명시적인 친러시아 진술(비교적 드문)을 통해서가 아니라, 선택적 주제 강조, 프레이밍 효과, 서사 반복을 통해 이루어졌다. 이러한 증폭은 감성 수준(개별 게시물이 어떻게 분류되는지)이 아닌 담론 수준(주제가 어떻게 프레이밍되는지)에서 작동하기 때문에 LLM 분류에는 대체로 포착되지 않는다.

계산 담론 분석에 대한 시사점은 명확하다. LLM은 개별 게시물은 분류할 수 있지만, 게시물 전반의 패턴에서 나타나는 담론 수준의 현상은 파악하기 어렵다.

식품 안전 감성의 변화

Ma and Zheng (2024)은 12회 인용으로, Weibo의 식품 안전 사건에 텍스트 마이닝과 감성 분석을 적용하여 식품 안전 위기 기간 동안 공중 감성이 시간에 따라 어떻게 변화하는지를 추적한다. 이 연구의 기여는 시간적 차원에 있다. 단순히 사람들이 무엇을 느끼는지가 아니라, 사건이 전개됨에 따라 그 감정이 어떻게 변화하는지를 분석한다.

분석은 일관된 패턴을 드러낸다.

초기 충격 (0-24시간): 공포와 분노가 지배하는 높은 부정적 감성.

책임 귀인 (1-3일): 감성이 일반화된 불안에서 특정 기업, 규제 기관, 또는 정부 기관을 겨냥한 분노로 전환.

정상화 (3-14일): 미디어 관심이 사라지면서 감성이 점차 기준선으로 회귀하지만, 관련 주제에 대한 후속 논의에서 잔류적 불신이 지속.

재활성화: 향후 식품 안전 사건이 축적된 불신을 재활성화하여, 새로운 사건의 객관적 심각성이 예측하는 것보다 더 강한 초기 반응을 유발.

이러한 시간적 패턴은 위기 커뮤니케이션에 실질적인 시사점을 제공한다. 효과적인 대응을 위한 창(window)은 좁으며(초기 24-48시간), 이 창 내에 대응하지 못할 경우 책임 귀인이 지속적인 공중 서사로 고착화된다.

농촌 경관 감성

Zhang, Jin, Rogers & Zhang(2024)는 5회 인용으로 다른 적용 사례를 보여준다: 딥러닝 모델을 활용하여 Weibo에서 농촌 경관에 대한 공중 감정을 분석한 것이다. 이 연구는 단순한 감정 분류를 넘어 경관 감상의 구체적인 차원—미적, 생태적, 향수적, 경제적—과 공공 담론에서 각 차원의 상대적 우세를 규명한다.

농촌 경관 논의에서 '향수(nostalgia)'가 지배적인 감정 차원이라는 발견(미적 감상이나 경제적 평가보다 더 우세함)은 농촌 계획에 시사점을 제공한다: 농촌 보존에 대한 공중의 지지는 생태적 또는 경제적 논거보다 이상화된 과거에 대한 정서적 애착에 의해 더 많이 추동될 수 있다.

결혼 담론

Ye와 Gao(2025)는 LLM 보조 내용 분석을 Weibo와 Xiaohongshu에서 수집한 결혼 관련 게시물 219,358건에 적용하여, 중국의 혼인율 감소가 소셜 미디어에서 어떻게 논의되는지 검토한다. 이 분석은 결혼 담론의 근저에 있는 도덕적 토대—배려, 공정성, 충성, 권위, 신성함—와 이러한 토대가 플랫폼 및 인구통계학적 집단에 따라 어떻게 달라지는지를 규명한다.

비판적 분석: 주장과 증거

주장	증거	판정
LLM은 정치 담론 분류에서 중립 편향을 체계적으로 드러낸다	Rogers & Zhang의 수동 vs. 자동 비교	✅ 지지됨 — 안전성 미세 조정이 비중립 감정의 과소 산정을 초래한다
위기 상황에서 소셜 미디어 감정은 예측 가능한 시간적 패턴을 따른다	Ma & Zheng의 식품 안전 위기 분석	✅ 지지됨 — 복수의 사건에 걸쳐 일관됨
향수가 농촌 경관에 대한 공공 담론을 지배한다	Zhang et al.의 감정 차원 분석	✅ 지지됨 — Weibo 데이터에서 향수 > 미적 감상 > 경제적 고려
LLM은 담론 수준의 현상(프레이밍, 증폭)을 포착한다	Rogers & Zhang의 분석	❌ 반박됨 — LLM은 개별 게시물을 분류하지만 담론 수준의 패턴을 포착하지 못한다

미해결 문제

중립 편향: LLM이 체계적으로 비중립 감정을 과소 산정한다면, 연구자들은 분류 결과를 어떻게 보정해야 하는가? 대표 하위 표본에 대한 인간 검증이 필요하지만 비용이 많이 든다.

플랫폼 효과: 플랫폼마다(Weibo vs. Xiaohongshu vs. Douyin) 콘텐츠 조정 정책, 이용자 인구 구성, 알고리즘 큐레이션이 다르다. 교차 플랫폼 분석은 이러한 차이를 어떻게 고려해야 하는가?

다언어 담론: 대부분의 소셜 미디어 담론 분석은 단일 언어 모델을 사용한다. 코드 전환(다언어 사회에서 흔히 나타남)은 어떻게 처리해야 하는가?

대규모 분석의 윤리: 게시물이 공개적으로 이용 가능하더라도, 개인 동의 없이 수백만 건의 소셜 미디어 게시물을 분석하는 것은 프라이버시 및 윤리적 문제를 제기한다.

연구에 주는 시사점

전산 언어학자에게는 Rogers와 Zhang의 중립 편향 발견이 방법론적으로 중요하다: LLM 분류 결과는 특히 정치적으로 민감한 콘텐츠의 경우 인간의 판단과 비교하여 검증되어야 한다.

위기 커뮤니케이션 연구자에게는 Ma와 Zheng의 시간적 패턴이 대응 시점 결정을 위한 경험적 근거를 갖춘 프레임워크를 제공한다.

관련 연구는 ORAA ResearchBrain을 통해 탐색할 수 있다.

References (4)

[1] Rogers, R. & Zhang, X. (2024). The Russia–Ukraine War in Chinese Social Media: LLM Analysis Yields a Bias Toward Neutrality. Social Media + Society.

DOI Scholar

[2] Ma, B. & Zheng, R. (2024). Exploring Food Safety Emergency Incidents on Sina Weibo: Using Text Mining and Sentiment Evolution. Journal of Food Protection, 100418.

DOI Scholar

[3] Zhang, J., Jin, G., & Liu, Y. (2024). Attention and sentiment of Chinese public toward rural landscape based on Sina Weibo. Scientific Reports.

DOI Scholar

[4] Ye, F. & Gao, X. (2025). Marriage Discourse on Chinese Social Media: An LLM-assisted Analysis. [Preprint].

Scholar

LLMs as Discourse Analysts: What Social Media Mining Reveals About Public Opinion

The Research Landscape

Geopolitical Discourse and LLM Bias

Food Safety Sentiment Evolution

Rural Landscape Sentiment

Marriage Discourse

Critical Analysis: Claims and Evidence

Open Questions

What This Means for Your Research

담론 분석가로서의 LLM: 소셜 미디어 마이닝이 여론에 대해 밝히는 것

연구 동향

지정학적 담론과 LLM 편향

식품 안전 감성의 변화

농촌 경관 감성

결혼 담론

비판적 분석: 주장과 증거

미해결 문제

연구에 주는 시사점

References (4)

Explore this topic deeper