Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safeguarding Crowdsourcing Surveys from ChatGPT with Prompt Injection (2306.08833v1)

Published 15 Jun 2023 in cs.HC

Abstract: ChatGPT and other LLMs have proven useful in crowdsourcing tasks, where they can effectively annotate machine learning training data. However, this means that they also have the potential for misuse, specifically to automatically answer surveys. LLMs can potentially circumvent quality assurance measures, thereby threatening the integrity of methodologies that rely on crowdsourcing surveys. In this paper, we propose a mechanism to detect LLM-generated responses to surveys. The mechanism uses "prompt injection", such as directions that can mislead LLMs into giving predictable responses. We evaluate our technique against a range of question scenarios, types, and positions, and find that it can reliably detect LLM-generated responses with more than 93% effectiveness. We also provide an open-source software to help survey designers use our technique to detect LLM responses. Our work is a step in ensuring that survey methodologies remain rigorous vis-a-vis LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Better by you, better than me, chatgpt3 as writing assistance in students essays. arXiv:2302.04536 [cs.AI]
  2. Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.
  3. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  4. Deep reinforcement learning from human preferences. arXiv:1706.03741 [stat.ML]
  5. Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. arXiv:2210.07321 [cs.CL]
  6. Crowdsourcing in HCI research. Ways of Knowing in HCI (2014), 267–289.
  7. Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30, 4 (2020), 681–694.
  8. Crowdsourcing: a review and suggestions for future research. International Journal of management reviews 20, 2 (2018), 343–363.
  9. Joseph K Goodman and Gabriele Paolacci. 2017. Crowdsourcing consumer research. Journal of Consumer Research 44, 1 (2017), 196–210.
  10. Riley Goodside. 2022. Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions. https://twitter.com/goodside/status/1569128808308957185.
  11. A reduced QWERTY keyboard for mobile text entry. In CHI’04 extended abstracts on Human factors in computing systems. 1429–1432.
  12. More than you’ve asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models. arXiv:2302.12173 [cs.CR]
  13. Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
  14. AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. arXiv:2303.16854 [cs.CL]
  15. Krystal Hu. 2023. ChatGPT sets record for fastest-growing user base - analyst note. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.
  16. Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314 (2020).
  17. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. arXiv:2212.14882 [cs.CL]
  18. Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916 [cs.CL]
  19. Are attention check questions a threat to scale validity? Applied Psychology 67, 2 (2018), 264–283.
  20. On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree? arXiv:2303.11146 [cs.CY]
  21. Putting ChatGPT’s Medical Advice to the (Turing) Test. arXiv:2301.10035 [cs.HC]
  22. OpenAI. 2023a. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/.
  23. OpenAI. 2023b. GPT-4. https://openai.com/research/gpt-4.
  24. OpenAI. 2023c. OpenAI API REFERENCE. https://platform.openai.com/docs/api-reference/chat.
  25. Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques For Language Models. arXiv:2211.09527 [cs.CL]
  26. Elvis Saravia. 2022. Prompt Engineering Guide. https://github.com/dair-ai/Prompt-Engineering-Guide (12 2022).
  27. Ronda L Sinkowitz-Cochran. 2013. Survey design: To ask or not to ask? That is the question…. Clinical Infectious Diseases 56, 8 (2013), 1159–1164.
  28. Does Synthetic Data Generation of LLMs Help Clinical Text Mining? arXiv:2303.04360 [cs.CL]
  29. Legal Prompt Engineering for Multilingual Legal Judgement Prediction. arXiv:2212.02199 [cs.CL]
  30. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 (2022).
  31. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]
  32. Peter Welch. 1967. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on audio and electroacoustics 15, 2 (1967), 70–73.
  33. LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework. arXiv:2112.07522 [cs.CL]
  34. A Survey of Large Language Models. arXiv:2303.18223 [cs.CL]
  35. Can GPT-4 Perform Neural Architecture Search? arXiv:2304.10970 [cs.LG]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Chaofan Wang (10 papers)
  2. Samuel Kernan Freire (6 papers)
  3. Mo Zhang (11 papers)
  4. Jing Wei (10 papers)
  5. Jorge Goncalves (58 papers)
  6. Vassilis Kostakos (27 papers)
  7. Zhanna Sarsenbayeva (9 papers)
  8. Christina Schneegass (4 papers)
  9. Alessandro Bozzon (15 papers)
  10. Evangelos Niforatos (10 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.