Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MEGAnno+: A Human-LLM Collaborative Annotation System (2402.18050v1)

Published 28 Feb 2024 in cs.CL and cs.HC

Abstract: LLMs can label data faster and cheaper than humans for various NLP tasks. Despite their prowess, LLMs may fall short in understanding of complex, sociocultural, or domain-specific context, potentially leading to incorrect annotations. Therefore, we advocate a collaborative approach where humans and LLMs work together to produce reliable and high-quality labels. We present MEGAnno+, a human-LLM collaborative annotation system that offers effective LLM agent and annotation management, convenient and robust LLM annotation, and exploratory verification of LLM labels by humans.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Autolabel. Github.com/refuel-ai/autolabel.
  2. Humanloop.com. Humanloop.com.
  3. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, page 298–306, New York, NY, USA. Association for Computing Machinery.
  4. How is chatgpt’s behavior changing over time?
  5. Black-box prompt optimization: Aligning large language models without model training.
  6. Is GPT-3 a good data annotator? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11173–11195, Toronto, Canada. Association for Computational Linguistics.
  7. Data quality in online human-subjects research: Comparisons between mturk, prolific, cloudresearch, qualtrics, and sona. Plos one, 18(3):e0279720.
  8. Challenges in data crowdsourcing. IEEE Transactions on Knowledge and Data Engineering, 28(4):901–911.
  9. Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30).
  10. AnnoLLM: Making large language models to be better crowdsourced annotators.
  11. The INCEpTION platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 5–9, Santa Fe, New Mexico. Association for Computational Linguistics.
  12. Jupyter notebooks – a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87 – 90. IOS Press.
  13. Chatgpt: Beginning of an end of manual linguistic data annotation? use case of automatic genre identification.
  14. Teaching models to express their uncertainty in words. Transactions on Machine Learning Research.
  15. Reply to mturk, prolific or panels? choosing the right audience for online research. SSRN Electronic Journal.
  16. Self-refine: Iterative refinement with self-feedback.
  17. Ines Montani and Matthew Honnibal. 2018. Prodigy: A new annotation tool for radically efficient machine teaching. Artificial Intelligence to appear.
  18. Pouya Pezeshkpour and Estevam Hruschka. 2023. Large language models sensitivity to the order of options in multiple-choice questions.
  19. Kim Bartel Sheehan. 2018. Crowdsourcing research: data collection with amazon’s mechanical turk. Communication Monographs, 85(1):140–156.
  20. Societal biases in language generation: Progress and challenges. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4275–4293, Online. Association for Computational Linguistics.
  21. Label sleuth: From unlabeled text to a classifier in a few hours. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 159–168, Abu Dhabi, UAE. Association for Computational Linguistics.
  22. Label Studio: Data labeling software. Open source software available from https://github.com/heartexlabs/label-studio.
  23. Petter Törnberg. 2023. Chatgpt-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning.
  24. Want to reduce labeling cost? GPT-3 can help. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4195–4205, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  25. Human-LLM collaborative annotation through effective verification of LLM labels. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24, New York, NY, USA. Association for Computing Machinery.
  26. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
  27. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms.
  28. MEGAnno: Exploratory labeling for NLP in computational notebooks. In Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances), pages 1–7, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  29. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
  30. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145.
  31. Can Large Language Models Transform Computational Social Science? Computational Linguistics, pages 1–53.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hannah Kim (19 papers)
  2. Kushan Mitra (4 papers)
  3. Rafael Li Chen (4 papers)
  4. Sajjadur Rahman (16 papers)
  5. Dan Zhang (171 papers)
Citations (11)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets