Supporting Annotators with Affordances for Efficiently Labeling Conversational Data (2403.07762v1)
Abstract: Without well-labeled ground truth data, machine learning-based systems would not be as ubiquitous as they are today, but these systems rely on substantial amounts of correctly labeled data. Unfortunately, crowdsourced labeling is time consuming and expensive. To address the concerns of effort and tedium, we designed CAL, a novel interface to aid in data labeling. We made several key design decisions for CAL, which include preventing inapt labels from being selected, guiding users in selecting an appropriate label when they need assistance, incorporating labeling documentation into the interface, and providing an efficient means to view previous labels. We implemented a production-quality implementation of CAL and report a user-study evaluation that compares CAL to a standard spreadsheet. Key findings of our study include users using CAL reported lower cognitive load, did not increase task time, users rated CAL to be easier to use, and users preferred CAL over the spreadsheet.
- Estimating annotation cost for active learning in a multi-annotator environment. In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing. Association for Computational Linguistics, 18–26.
- Avrim L Blum and Pat Langley. 1997. Selection of relevant features and examples in machine learning. Artificial intelligence 97, 1-2 (1997), 245–271.
- John Brooke et al. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, 194 (1996), 4–7.
- Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2334–2346.
- Aron Culotta and Andrew McCallum. 2005. Reducing labeling effort for structured prediction tasks. In AAAI, Vol. 5. 746–751.
- Jeff Donahue and Kristen Grauman. 2011. Annotator rationales for visual recognition. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 1395–1402.
- Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 259–268.
- Aeonium: Visual analytics to support collaborative qualitative coding. In 2017 IEEE Pacific Visualization Symposium (PacificVis). 220–229. https://doi.org/10.1109/PACIFICVIS.2017.8031598
- A survey on instance selection for active learning. Knowledge and information systems 35, 2 (2013), 249–283.
- H Paul Grice. 1975. Logic and conversation. 1975 (1975), 41–58.
- Why is that relevant? Collecting annotator rationales for relevance judgments. In Fourth AAAI Conference on Human Computation and Crowdsourcing.
- Cognitive Load Measurement as a Means to Advance Cognitive Load Theory. Educational Psychologist 38, 1 (2003), 63–71.
- Measurement of cognitive load in instructional research. Perceptual and Motor Skills 79, 1 (1994), 419–430.
- Marcus Renner and Ellen Taylor-Powell. 2003. Analyzing qualitative data. Programme Development & Evaluation, University of Wisconsin-Extension Cooperative Extension (2003), 1–10.
- Taylor Jackson Scott. 2017. A Framework of Distributed Affect in Text-Based Communication. Ph. D. Dissertation. University of Washington.
- Burr Settles and Mark Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1070–1079.
- Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 614–622.
- Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 254–263.
- BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 102–107.
- Meng Wang and Xian-Sheng Hua. 2011. Active learning in multimedia annotation and retrieval: A survey. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 2 (2011), 10.
- D Windell and EN Wiebe. 2007. Measuring cognitive load in multimedia instruction: A comparison of two instruments. Annual meeting of the American Educational Research Association (2007).
- Automatically generating annotator rationales to improve sentiment classification. In Proceedings of the ACL 2010 Conference Short Papers. Association for Computational Linguistics, 336–341.
- Using “annotator rationales” to improve machine learning for text categorization. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference. 260–267.
- Machine learning with annotator rationales to reduce annotation cost. In Proceedings of the NIPS* 2008 workshop on cost sensitive learning. 260–267.