Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions (1808.09600v1)

Published 29 Aug 2018 in cs.SI and cs.CY

Abstract: Nowcasting based on social media text promises to provide unobtrusive and near real-time predictions of community-level outcomes. These outcomes are typically regarding people, but the data is often aggregated without regard to users in the Twitter populations of each community. This paper describes a simple yet effective method for building community-level models using Twitter language aggregated by user. Results on four different U.S. county-level tasks, spanning demographic, health, and psychological outcomes show large and consistent improvements in prediction accuracies (e.g. from Pearson r=.73 to .82 for median income prediction or r=.37 to .47 for life satisfaction prediction) over the standard approach of aggregating all tweets. We make our aggregated and anonymized community-level data, derived from 37 billion tweets -- over 1 billion of which were mapped to counties, available for research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Salvatore Giorgi (18 papers)
  2. Daniel Preotiuc-Pietro (17 papers)
  3. Anneke Buffone (6 papers)
  4. Daniel Rieman (1 paper)
  5. Lyle H. Ungar (16 papers)
  6. H. Andrew Schwartz (32 papers)
Citations (35)

Summary

We haven't generated a summary for this paper yet.