Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place (1302.3299v3)

Published 14 Feb 2013 in physics.soc-ph and cs.SI

Abstract: We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated over the course of several recent years on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-level measures such as obesity rates.

Citations (484)

Summary

  • The paper demonstrates how sentiment analysis of over 10 million geo-tagged tweets quantifies happiness across US states and cities.
  • The paper reveals that happiness levels, derived from word frequency analysis, correlate positively with wealth and education and negatively with obesity.
  • The paper underscores the utility of social media data as a real-time complement to traditional surveys for informing urban planning and public health policies.

The Geography of Happiness: Connecting Twitter Sentiment, Demographics, and Urban Characteristics

Introduction

The paper under discussion investigates the correlations between real-time expressions on Twitter and various demographic, emotional, and geographic attributes across the United States. Utilizing a dataset of geo-tagged tweets from 2011, the research aims to quantify happiness levels across states and urban areas using sentiment analysis techniques. This exploration provides insights into how word use relates to societal well-being and offers a potential methodology for measuring happiness on a large scale.

Methodology

The researchers employed a corpus of over 10 million tweets, specifically focusing on geo-tagged messages. They used the LabMT word list to assign happiness scores to frequently used words. These scores facilitated the calculation of average happiness for different texts, allowing for a word frequency analysis at city and state levels. This approach consciously overlooked contextual nuances in favor of computational efficiency and neutrality.

Findings

Initial results reveal minimal variation in happiness scores across states, with Hawaii emerging as the happiest and Louisiana as the saddest. The paper identified correlations between these happiness levels and state-level characteristics, comparing them with existing surveys such as the Gallup well-being index and the US Peace Index. Notably, word frequency distributions allowed for the clustering of states based on linguistic similarities.

In examining cities, Napa, California, was noted as the happiest, while Beaumont, Texas, was identified as the least happy. High happiness scores in specific cities were linked to positive words such as "lol" and "love," while negative sentiment was often associated with profanity and negations like "don't" and "never."

Correlation with Demographic Data

A significant part of the analysis was dedicated to correlating happiness scores with demographic attributes from the 2011 American Community Survey. Happiness was found to correlate strongly with socioeconomic indicators, notably reflecting higher levels of wealth and education. Conversely, obesity—identified from Gallup's survey—showed a strong negative correlation with happiness.

Further analysis of word frequency revealed that terms associated with wealthier lifestyles (e.g., "cafe", "yoga", "software") positively correlated with educational levels, while more colloquial or emotionally charged words negatively correlated. These insights could help explain word usage patterns in low versus high socioeconomic areas.

Implications and Future Directions

This research highlights the potential utility of social media data in measuring population-level happiness in near real-time. The correlation with demographic and health indices suggests that such methodologies could complement traditional survey-based approaches, offering timely insights into societal well-being.

Future research could expand on these findings by exploring the predictive power of social media sentiment in relation to changes in socioeconomic factors. Additionally, incorporating multilingual word lists and analyzing sentence contexts would further refine these methodologies.

The implication of such studies is profound for urban planning, public health, and policy-making, where understanding the nuances of community sentiment can drive more effective interventions and strategies. As the field matures, the integration of diverse data streams promises a more nuanced understanding of societal well-being across different geographic and demographic landscapes.