Language Models Trained on Media Diets Can Predict Public Opinion (2303.16779v1)

Published 28 Mar 2023 in cs.CL and cs.LG

Abstract: Public opinion reflects and shapes societal behavior, but the traditional survey-based tools to measure it are limited. We introduce a novel approach to probe media diet models -- LLMs adapted to online news, TV broadcast, or radio show content -- that can emulate the opinions of subpopulations that have consumed a set of media. To validate this method, we use as ground truth the opinions expressed in U.S. nationally representative surveys on COVID-19 and consumer confidence. Our studies indicate that this approach is (1) predictive of human judgements found in survey response distributions and robust to phrasing and channels of media exposure, (2) more accurate at modeling people who follow media more closely, and (3) aligned with literature on which types of opinions are affected by media consumption. Probing LLMs provides a powerful new method for investigating media effects, has practical applications in supplementing polls and forecasting public opinion, and suggests a need for further study of the surprising fidelity with which neural LLMs can predict human responses.

Authors (4)

Eric Chu (17 papers)
Jacob Andreas (116 papers)
Stephen Ansolabehere (1 paper)
Deb Roy (47 papers)

Citations (25)

View on Semantic Scholar

Summary

The paper introduces a novel method that fine-tunes language models using media diets to simulate and predict public opinion dynamics.
It demonstrates that models like BERT, enhanced with attention metrics, achieve significant correlations (up to 0.523) with survey responses.
The approach offers a scalable alternative to traditional polls, providing actionable insights into media influence on public sentiment.

Predicting Public Opinion Using Media-Trained LLMs

This paper introduces a novel approach for predicting public opinion by leveraging LLMs fine-tuned with "media diets," which correspond to specific media content from various outlets. These models provide insights into public sentiment by emulating the opinions of demographic subpopulations exposed to distinct media sources. The research employs LLMs, notably BERT and simpler n-gram models, to capture and analyze the relationships between media consumption and public opinion, as evidenced in survey responses.

The paper presents significant findings across two domains: COVID-19 public health issues and consumer confidence. In both areas, the media diet models demonstrated predictive capability in capturing public opinion relative to traditional survey methods. Specifically, LLMs adapted to news articles from a particular period accurately simulated the survey response distributions of individuals affiliating with specific news outlets. For instance, models trained on CNN, Fox News, and NPR content could predict how closely aligned the audience's opinions were with the survey data concerning COVID-19.

Numerical Results and Findings

The correlation between media diet model scores and survey responses in the COVID-19 context was observed to be 0.458, a statistically significant result.
When incorporating attention metrics (i.e., the degree to which respondents were tuned into news coverage about COVID-19), the combined model increased predictive accuracy, with the attention feature amplifying the correlation to 0.523.
Various media types, including online, TV, and radio, were proficient in consistently providing predictive insights across different mediums, enhancing the robustness of the approach.

Methodological Contributions

The research emphasizes several important methodological contributions, such as:

The application of pre-trained LLMs like BERT as opposed to simpler models, which result in higher accuracy.
The introduction of a "synonym-grouping method" for aggregating probabilities associated with synonyms of potential survey responses, leading to improved prediction accuracy.
An analysis of heterogeneous media effects, where the exposure to specific outlets had differentiated impacts on opinion formation among demographic groups.

Theoretical and Practical Implications

From a theoretical perspective, the findings reaffirm the effects of media on public opinion, notably extending the known landscape where media content influences societal beliefs. The models provide a new tool to quantify media biases' effects, selectively exposing theories on how media can potentially both educate and mislead the public.

Practically, these models offer a scalable solution to supplement traditional polls, which are resource-intensive and often have diminishing response rates. This computational approach can provide continuously updated insights into public opinion, which could be particularly useful for tracking rapidly evolving public sentiments during crises like pandemics or economic downturns.

Future Directions

The paper opens several avenues for future research in AI and public opinion dynamics. These could include extending the model to incorporate social media, exploring more complex network effects, and improving explanations for why certain predictions are made. Furthermore, assessing the ethical considerations of deploying such AI models for public opinion analysis will be crucial, highlighting the need for transparency and accountability in AI systems influencing democratic processes.

Overall, the paper provides a detailed account of utilizing LLMs to simulate public opinion dynamics, showcasing the significant potential for AI to enhance our understanding and forecasting of societal trends rooted in media consumption. Such work has meaningful implications for both diagnosing current public opinion trends and forecasting future shifts, offering valuable insights for policymakers and researchers alike.

PDF Markdown

Related Papers

Tweets

https://twitter.com/its_ericchu/status/1745744264854720780

https://twitter.com/ullrich/status/1773977211017552315

https://twitter.com/KevinMerlini/status/1754229866524709253

YouTube

Show All Videos

HackerNews

Feeding AI personas media diets improves prediction (2 points, 0 comments)