Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integrating Crowdsourcing and Active Learning for Classification of Work-Life Events from Tweets (2003.12139v2)

Published 26 Mar 2020 in cs.CL, cs.LG, cs.SI, and stat.ML

Abstract: Social media, especially Twitter, is being increasingly used for research with predictive analytics. In social media studies, NLP techniques are used in conjunction with expert-based, manual and qualitative analyses. However, social media data are unstructured and must undergo complex manipulation for research use. The manual annotation is the most resource and time-consuming process that multiple expert raters have to reach consensus on every item, but is essential to create gold-standard datasets for training NLP-based machine learning classifiers. To reduce the burden of the manual annotation, yet maintaining its reliability, we devised a crowdsourcing pipeline combined with active learning strategies. We demonstrated its effectiveness through a case study that identifies job loss events from individual tweets. We used Amazon Mechanical Turk platform to recruit annotators from the Internet and designed a number of quality control measures to assure annotation accuracy. We evaluated 4 different active learning strategies (i.e., least confident, entropy, vote entropy, and Kullback-Leibler divergence). The active learning strategies aim at reducing the number of tweets needed to reach a desired performance of automated classification. Results show that crowdsourcing is useful to create high-quality annotations and active learning helps in reducing the number of required tweets, although there was no substantial difference among the strategies tested.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yunpeng Zhao (29 papers)
  2. Mattia Prosperi (10 papers)
  3. Tianchen Lyu (1 paper)
  4. Yi Guo (115 papers)
  5. Jiang Bian (229 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.