A Corpus for Sentence-level Subjectivity Detection on English News Articles (2305.18034v3)
Abstract: We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task.
- Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022. European Language Resources Association, Marseille, France.
- Multilingual sentence-level bias detection in Wikipedia. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 42–51, Varna, Bulgaria. INCOMA Ltd.
- Cross-lingual subjectivity detection for resource lean languages. In Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 81–90, Minneapolis, USA. Association for Computational Linguistics.
- Subjectivita: An italian corpus for subjectivity detection in newspapers. In CLEF, volume 12880 of LNCS, pages 40–52. Springer.
- Multilingual subjectivity: Are more languages better? In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 28–36, Beijing, China. Coling 2010 Organizing Committee.
- Sense-level subjectivity in a multilingual setting. Comput. Speech Lang., 28(1):7–19.
- The CLEF-2023 checkthat! lab: Checkworthiness, subjectivity, political bias, factuality, and authority. In Advances in Information Retrieval - 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2-6, 2023, Proceedings, Part III, volume 13982 of Lecture Notes in Computer Science, pages 506–517. Springer.
- Overview of the CLEF-2023 checkthat! lab on checkworthiness, subjectivity, political bias, factuality, and authority of news articles and their source. In CLEF, volume 14163 of Lecture Notes in Computer Science, pages 251–275. Springer.
- Towards context-based subjectivity analysis. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 1180–1188, Chiang Mai, Thailand. Asian Federation of Natural Language Processing.
- Detecting happiness in italian tweets: Towards an evaluation dataset for sentiment analysis in felicitta. In ES3LOD@LREC, pages 56–63. ELRA.
- Developing corpora for sentiment analysis: The case of irony and senti-tut. IEEE Intell. Syst., 28(2):55–63.
- Daniel Braun. 2023. I beg to differ: how disagreement is handled in the annotation of legal machine learning data sets. Artificial Intelligence and Law.
- Toward a perspectivist turn in ground truthing for predictive computing. In AAAI, pages 6860–6868. AAAI Press.
- Distinguishing between facts and opinions for sentiment analysis: Survey and challenges. Inf. Fusion, 44:65–77.
- Phillipa Chong. 2019. Valuing subjectivity in journalism: Bias, emotions, and self-interest as tools in arts reporting. Journalism, 20(3):427–443.
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, 20(3):273–297.
- J.S. Cramer. 2004. The early origins of the logit model. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 35(4):613–626.
- Nilanjana Das and Santwana Sagnika. 2020. A subjectivity detection-based approach to sentiment analysis. In Machine Learning and Information Processing, pages 149–160, Singapore. Springer Singapore.
- Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Trans. Assoc. Comput. Linguistics, 10:92–110.
- Make the best of cross-lingual transfer: Evidence from POS tagging with over 100 languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7676–7685, Dublin, Ireland. Association for Computational Linguistics.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Andrea Esuli and Fabrizio Sebastiani. 2006. Determining term subjectivity and term orientation for opinion mining. In 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 193–200, Trento, Italy. Association for Computational Linguistics.
- A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, page 3007–3014, New York, NY, USA. Association for Computing Machinery.
- Christiane Fellbaum. 2010. WordNet, pages 231–243. Springer Netherlands, Dordrecht.
- Edward Finegan. 1995. Subjectivity and subjectivisation: an introduction. Subjectivity and subjectivisation: Linguistic perspectives, pages 1–15.
- Overview of the CLEF-2023 checkthat! lab: Task 2 on subjectivity detection. In CLEF (Working Notes), volume 3497 of CEUR Workshop Proceedings, pages 236–249. CEUR-WS.org.
- Chengguang Gan and Tatsunori Mori. 2023. Sensitivity and robustness of large language models to prompt template in japanese text classification tasks. In PACLIC, pages 1–11. Association for Computational Linguistics.
- Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 1161–1166. Association for Computational Linguistics.
- Lucas Graves. 2016. Deciding what’s true: The rise of political fact-checking in American journalism. Columbia University Press.
- Yes, we can! mining arguments in 50 years of US presidential campaign debates. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4684–4690, Florence, Italy. Association for Computational Linguistics.
- Improving zero-shot cross-lingual transfer learning via robust training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1684–1697, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Christoph Hube and Besnik Fetahu. 2019. Neural based statement classification for biased language. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 2019, pages 195–203. ACM.
- Fake news classification based on subjective language. In Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services, iiWAS 2019, Munich, Germany, December 2-4, 2019, pages 15–24. ACM.
- Klaus Krippendorff. 2011. Computing krippendorff’s alpha-reliability.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
- Multilingual subjectivity and sentiment analysis. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, page 4, Jeju Island, Korea. Association for Computational Linguistics.
- Arianna Muti and Alberto Barrón-Cedeño. 2022. A checkpoint on multilingual misogyny identification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 454–460, Dublin, Ireland. Association for Computational Linguistics.
- Aggregating and predicting sequence labels from crowd annotations. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 299–309, Vancouver, Canada. Association for Computational Linguistics.
- A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 271–278, Barcelona, Spain.
- Reducing gender bias in abusive language detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2799–2804, Brussels, Belgium. Association for Computational Linguistics.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
- Nils Reimers and Iryna Gurevych. 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4512–4525, Online. Association for Computational Linguistics.
- Ellen Riloff and Janyce Wiebe. 2003. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages 105–112.
- Two contrasting data annotation paradigms for subjective NLP tasks. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 175–190, Seattle, United States. Association for Computational Linguistics.
- Sentence-level subjectivity detection using neuro-fuzzy models. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 108–114, Atlanta, Georgia. Association for Computational Linguistics.
- Abel Salinas and Fred Morstatter. 2024. The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance. CoRR, abs/2401.03729.
- Elena Savinova and Fermin Moscoso Del Prado. 2023. Analyzing subjectivity using a transformer-based regressor trained on naïve speakers’ judgements. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, pages 305–314, Toronto, Canada. Association for Computational Linguistics.
- Quantifying language models’ sensitivity to spurious features in prompt design or: How I learned to start worrying about prompt formatting. CoRR, abs/2310.11324.
- Adam Stepinski and Vibhu O. Mittal. 2007. A fact/opinion classifier for news articles. In SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, pages 807–808. ACM.
- Increasing argument annotation reproducibility by using inter-annotator agreement to improve guidelines. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- University of Adelaide. 2014. Objective language.
- Predicting sentence-level factuality of news and bias of media outlets. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2023), pages 1197–1206, Varna, Bulgaria.
- Kisfalvi Veronika. 2006. Subjectivity and emotions as sources of insight in an ethnographic case study: A tale of the field. M@n@gement, 9(3):117–135.
- Analysis of the subjectivity level in fake news fragments. In WebMedia ’20: Brazillian Symposium on Multimedia and the Web, São Luís, Brazil, November 30 - December 4, 2020, pages 233–240. ACM.
- Overview of TASS 2015. In Proceedings of TASS 2015: Workshop on Sentiment Analysis at SEPLN co-located with 31st SEPLN Conference (SEPLN 2015), Alicante, Spain, September 15, 2015, volume 1397 of CEUR Workshop Proceedings, pages 13–21. CEUR-WS.org.
- Exploring demographic language variations to improve multilingual sentiment analysis in social media. In EMNLP, pages 1815–1827. ACL.
- Karin Wahl-Jorgensen. 2013. Subjectivity and story-telling in journalism. Journalism Studies, 14(3):305–320.
- Development and use of a gold-standard data set for subjectivity classifications. In ACL, pages 246–253. ACL.
- Janyce Wiebe and Ellen Riloff. 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Computational Linguistics and Intelligent Text Processing, pages 486–497, Berlin, Heidelberg. Springer Berlin Heidelberg.
- Annotating expressions of opinions and emotions in language. Lang. Resour. Evaluation, 39(2-3):165–210.
- Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 246–253, College Park, Maryland, USA. Association for Computational Linguistics.
- Detection of Abusive Language: the Problem of Biased Datasets. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 602–608, Minneapolis, Minnesota. Association for Computational Linguistics.
- Hong Yu and Vasileios Hatzivassiloglou. 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP ’03, page 129–136, USA. Association for Computational Linguistics.