Papers
Topics
Authors
Recent
Search
2000 character limit reached

Designing NLP Systems That Adapt to Diverse Worldviews

Published 18 May 2024 in cs.CL | (2405.11197v1)

Abstract: Natural Language Inference (NLI) is foundational for evaluating language understanding in AI. However, progress has plateaued, with models failing on ambiguous examples and exhibiting poor generalization. We argue that this stems from disregarding the subjective nature of meaning, which is intrinsically tied to an individual's \textit{weltanschauung} (which roughly translates to worldview). Existing NLP datasets often obscure this by aggregating labels or filtering out disagreement. We propose a perspectivist approach: building datasets that capture annotator demographics, values, and justifications for their labels. Such datasets would explicitly model diverse worldviews. Our initial experiments with a subset of the SBIC dataset demonstrate that even limited annotator metadata can improve model performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Whose opinions matter? perspective-aware models to identify opinions of hate speech victims in abusive language detection.
  2. Toward a perspectivist turn in ground truthing for predictive computing.
  3. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
  4. Adversarial filters of dataset biases.
  5. Social or individual disagreement? perspectivism in the annotation of sexist jokes.
  6. Subjective Isms? on the danger of conflating hate and offence in abusive language detection.
  7. Did they answer? subjective acts and intents in conversational discourse. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1626–1644, Online. Association for Computational Linguistics.
  8. Capturing the varieties of natural language inference: A systematic survey of existing datasets and two novel benchmarks. page 21–48.
  9. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
  10. How crowd worker factors influence subjective annotations: A study of tagging misogynistic hate speech in tweets.
  11. Nan-Jiang Jiang and Marie-Catherine de Marneffe. 2022. Investigating reasons for disagreement in natural language inference. Transactions of the Association for Computational Linguistics, 10:1357–1374.
  12. What can we learn from collective human opinions on natural language inference data?
  13. What can we learn from collective human opinions on natural language inference data? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
  14. The ecological fallacy in annotation: Modelling human label variation goes beyond sociodemographics.
  15. Human uncertainty makes classification more robust.
  16. W. V. O. Quine. 1980. Two dogmas of empiricism. from a logical point of view. page 20–46.
  17. The measuring hate speech corpus: Leveraging rasch measurement theory for data perspectivism. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 83–94, Marseille, France. European Language Resources Association.
  18. Social bias frames: Reasoning about social and power implications of language.
  19. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5884–5906, Seattle, United States. Association for Computational Linguistics.
  20. A case for soft loss functions.
  21. Xiang Zhou and Mohit Bansal. 2020. Towards robustifying NLI models against lexical dataset biases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8759–8771, Online. Association for Computational Linguistics.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.