Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prevalence and prevention of large language model use in crowd work (2310.15683v1)

Published 24 Oct 2023 in cs.CL

Abstract: We show that the use of LLMs is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by raising the cost of using them, e.g., by disabling copy-pasting. Secondary analyses give further insight into LLM use and its prevention: LLM use yields high-quality but homogeneous responses, which may harm research concerned with human (rather than model) behavior and degrade future models trained with crowdsourced data. At the same time, preventing LLM use may be at odds with obtaining high-quality responses; e.g., when requesting workers not to use LLMs, summaries contained fewer keywords carrying essential information. Our estimates will likely change as LLMs increase in popularity or capabilities, and as norms around their usage change. Yet, understanding the co-evolution of LLM-based tools and users is key to maintaining the validity of research done using crowdsourcing, and we provide a critical baseline before widespread adoption ensues.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Veniamin Veselovsky (17 papers)
  2. Manoel Horta Ribeiro (44 papers)
  3. Philip Cozzolino (1 paper)
  4. Andrew Gordon (4 papers)
  5. David Rothschild (7 papers)
  6. Robert West (154 papers)
Citations (17)
X Twitter Logo Streamline Icon: https://streamlinehq.com