Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings (2309.09783v2)

Published 18 Sep 2023 in cs.CL

Abstract: The paper presents a new training dataset of sentences in 7 languages, manually annotated for sentiment, which are used in a series of experiments focused on training a robust sentiment identifier for parliamentary proceedings. The paper additionally introduces the first domain-specific multilingual transformer LLM for political science applications, which was additionally pre-trained on 1.72 billion words from parliamentary proceedings of 27 European parliaments. We present experiments demonstrating how the additional pre-training on parliamentary data can significantly improve the model downstream performance, in our case, sentiment identification in parliamentary proceedings. We further show that our multilingual model performs very well on languages not seen during fine-tuning, and that additional fine-tuning data from other languages significantly improves the target parliament's results. The paper makes an important contribution to multiple disciplines inside the social sciences, and bridges them with computer science and computational linguistics. Lastly, the resulting fine-tuned LLM sets up a more robust approach to sentiment analysis of political texts across languages, which allows scholars to study political sentiment from a comparative perspective using standardized tools and techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Gavin Abercrombie and Riza Batista-Navarro. 2018a. ‘Aye’ or ‘No’? Speech-level Sentiment Analysis of Hansard UK Parliamentary Debate Transcripts. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
  2. Gavin Abercrombie and Riza Batista-Navarro. 2020. Sentiment and position-taking analysis of parliamentary debates: a systematic literature review. Journal of Computational Social Science, 3(1):245–270.
  3. Gavin Abercrombie and Riza Theresa Batista-Navarro. 2018b. Identifying Opinion-Topics and Polarity of Parliamentary Debate Motions. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 280–285, Brussels, Belgium. Association for Computational Linguistics.
  4. Alan I. Abramowitz and Steven Webster. 2016. The rise of negative partisanship and the nationalization of U.S. elections in the 21st century. Electoral Studies, 41(March):12–22.
  5. Co-operation of Biology Related Algorithms for Solving Opinion Mining Problems by Using Different Term Weighting Schemes. In Kurosh Madani, Dimitri Peaucelle, and Oleg Gusikhin, editors, Informatics in Control, Automation and Robotics : 13th International Conference, ICINCO 2016 Lisbon, Portugal, 29-31 July, 2016, Lecture Notes in Electrical Engineering, pages 73–90. Springer International Publishing, Cham.
  6. The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework. In Coling 2008: Companion volume: Posters, pages 15–18, Manchester, UK. Coling 2008 Organizing Committee.
  7. A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts. PLOS ONE, 15(11):e0242050. Publisher: Public Library of Science.
  8. Adam Bermingham and Alan F. Smeaton. 2011. On using Twitter to monitor political sentiment and predict election results. Chiang Mai, Thailand.
  9. Adam Bonica. 2016. A Data-Driven Voter Guide for U.S. Elections: Adapting Quantitative Measures of the Preferences and Priorities of Political Elites to Help Voters Learn About Candidates. RSF: The Russell Sage Foundation Journal of the Social Sciences, 2(7):11–32. Publisher: Russell Sage Foundation.
  10. Politics and big data: nowcasting and forecasting elections with social media. Routledge, Abingdon, New York. OCLC: 1084404675.
  11. Yanqing Chen and Steven Skiena. 2014. Building Sentiment Lexicons for All Major Languages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 383–389, Baltimore, Maryland. Association for Computational Linguistics.
  12. Yanqing Chen and Steven Skiena. 2016. False-friend detection and entity matching via unsupervised transliteration. arXiv preprint arXiv:1611.06722.
  13. Paul A. Chilton. 2004. Analysing political discourse: theory and practice. Routledge, London.
  14. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
  15. Affective polarization, local contexts and public opinion in America. Nature Human Behaviour, 5(1):28–38. Number: 1 Publisher: Nature Publishing Group.
  16. Rory Duthie and Katarzyna Budzynska. 2018. A deep modular RNN approach for ethos mining: 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, pages 4041–4047. Publisher: International Joint Conferences on Artificial Intelligence.
  17. Multilingual comparable corpora of parliamentary debates ParlaMint 3.0. Slovenian language resource repository CLARIN.SI.
  18. Multilingual comparable corpora of parliamentary debates ParlaMint 3.0. https://www.clarin.eu/content/parlamint. Accepted: 2023-07-04T13:40:35Z Publisher: CLARIN ERIC.
  19. Political sectarianism in America. Science, 370(6516):533–536. Publisher: American Association for the Advancement of Science.
  20. René D. Flores. 2017. Do Anti-Immigrant Laws Shape Public Sentiment? A Study of Arizona’s SB 1070 Using Twitter Data. American Journal of Sociology, 123(2):333–384.
  21. Implications of pro- and counterattitudinal information exposure for affective polarization. Human Communication Research, 40(3):309–332. Place: United Kingdom Publisher: Wiley-Blackwell Publishing Ltd.
  22. Jacques Gerstlé and Alessandro Nai. 2019. Negativity, emotionality and populist rhetoric in election campaigns worldwide, and their effects on media attention and electoral success. European Journal of Communication, 34(4):410–444. Publisher: SAGE Publications Ltd.
  23. Unsupervised Cross-Lingual Scaling of Political Texts. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, volume 2, pages 688–693, Valencia.
  24. Semi-supervised Acquisition of Croatian Sentiment Lexicon. In Petr Sojka, Aleš Horák, Ivan Kopeček, and Karel Pala, editors, Text, speech and dialogue: 15th international conference, TSD 2012, Brno, Czech Republic, September 3-7, 2012: proceedings, pages 166–173. Springer, Berlin, Heidelberg.
  25. Tanvi Hardeniya and D. A. Borikar. 2016. Dictionary Based Approach to Sentiment Analysis - A Review. International Journal of Advanced Engineering, Management and Science.
  26. Five-Dimensional Sentiment Analysis of Corpora, Documents and Words. In Advances in Self-Organizing Maps and Learning Vector Quantization, Advances in Intelligent Systems and Computing, pages 209–218, Cham. Springer International Publishing.
  27. Politicising Europe: integration and mass politics. Cambridge University Press, Cambirdge. OCLC: 1280510451.
  28. Shanto Iyengar and Stephen Ansolabehere. 1995. Going negative. Free Press, New Yotk.
  29. Affect, Not Ideology: A Social Identity Perspective on Polarization. Public Opinion Quarterly, 76(3):405–431.
  30. Philipp Koehn. 2011. European parliament proceedings parallel corpus 1996-2011.
  31. Ruud Koopmans and Paul Statham. 2006. Political Claims Analysis: Integrating Protest Event and Political Discourse Approaches. Mobilization: An International Quarterly, 4(2):203–221.
  32. George Lakoff. 2004. Don’t Think of an Elephant. Chelsea Green, White River Junction.
  33. Harold Dwight Lasswell. 1927. Propaganda technique in the world war. Peter Smith, New York.
  34. Dilin Liu and Lei Lei. 2018. The appeal to political sentiment: An analysis of Donald Trump’s and Hillary Clinton’s speech themes and discourse strategies in the 2016 US presidential election. Discourse, Context & Media, 25:143–152.
  35. Nikola Ljubešić and Kaja Dobrovoljc. 2019. What does neural bring? analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 29–34, Florence, Italy. Association for Computational Linguistics.
  36. Lena Masch and Oscar W. Gabriel. 2020. How Emotional Displays of Political Leaders Shape Citizen Attitudes: The Case of German Chancellor Angela Merkel. German Politics, 29:158–179. Publisher: Routledge _eprint: https://doi.org/10.1080/09644008.2019.1657096.
  37. Lilliana Mason. 2015. “I Disrespectfully Agree”: The Differential Effects of Partisan Sorting on Social and Issue Polarization. American Journal of Political Science, 59(1):128–145. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/ajps.12089.
  38. Lilliana Mason. 2018. Uncivil Agreement: How Politics Became Our Identity. University of Chicago Press, Chicago.
  39. Michal Mochtak. 2022. SVKCorp: Corpus of Debates in the National Council of the Slovak Republic. Publisher: Zenodo.
  40. Talking War: Representation, Veterans and Ideology in Post-War Parliamentary Debates. Government and Opposition, 57(1):148–170.
  41. CROCorp: Corpus of Parliamentary Debates in Croatia (v1.1.1). Https://doi.org/10.5281/zenodo.6521372.
  42. SRBCorp: Corpus of Parliamentary Debates in Serbia (v1.1.1). Https://doi.org/10.5281/zenodo.6521648.
  43. BiHCorp: Corpus of Parliamentary Debates in Bosnia and Herzegovina (v1.1.1). Https://doi.org/10.5281/zenodo.6517697.
  44. The parlasent-bcs dataset of sentiment-annotated parliamentary debates from bosnia-herzegovina, croatia, and serbia. arXiv preprint arXiv:2206.00929.
  45. Saif M. Mohammad. 2021. Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text. Https://arxiv.org/abs/2005.11882.
  46. Prodigy: A modern and scriptable annotation tool for creating training data for machine learning models. Prodigy.
  47. Multilingual Twitter Sentiment Classification: The Role of Human Annotators. PLOS ONE, 11(5):e0155036. Publisher: Public Library of Science.
  48. Nona Naderi and Graeme Hirst. 2016. Argumentation Mining in Parliamentary Discourse. In Principles and Practice of Multi-Agent Systems, pages 16–25, Cham. Springer International Publishing.
  49. Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 80–90, Online. Association for Computational Linguistics.
  50. Towards Sentiment Analysis on Parliamentary Debates in Hansard. In Revised Selected Papers of the Third Joint International Conference on Semantic Technology - Volume 8388, JIST 2013, pages 48–50, Berlin, Heidelberg. Springer-Verlag.
  51. Erica Owen. 2017. Exposure to Offshoring and the Politics of Trade Liberalization: Debate and Votes on Free Trade Agreements in the US House of Representatives, 2001–2006. International Studies Quarterly, 61(2):297–311.
  52. James A. Piazza. 2020. Politician hate speech and domestic terrorism. International Interactions, 46(3):431–453. Publisher: Routledge.
  53. Comparative Analysis of Different Transformer Based Architectures Used in Sentiment Analysis. In 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART), pages 411–415.
  54. Bingham G. Powell. 2004. Political Representation in Comparative Politics. 7:273–296. Publisher: Annual Reviews.
  55. Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches. Legislative Studies Quarterly, 44(1):97–131.
  56. Christian Rauh. 2018. Validating a sentiment dictionary for German political language—a workbench note. Journal of Information Technology & Politics, 15(4):319–343.
  57. Tyler Rinker. 2017. GitHub - trinker/entity: Easy named entity extraction.
  58. Zaher Salah. 2015. Machine learning and sentiment analysis approaches for the analysis of Parliamentary debates. Ph.D. thesis, University of Liverpool. Publisher: University of Liverpool.
  59. Edward A. Shils and Morris Janowitz. 1948. Cohesion and Disintegration in the Wehrmacht in World War II. Public Opinion Quarterly, 12(2):315. Publisher: Oxford University Press (OUP).
  60. Twitter as a Tool for Predicting Elections Results. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 1194–1200.
  61. Pre-training BERT on domain resources for short answer grading. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6071–6075, Hong Kong, China. Association for Computational Linguistics.
  62. Luka Terčon and Nikola Ljubešić. 2023. Classla-stanza: The next step for linguistic processing of south slavic languages.
  63. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pages 327–335. _eprint: 0607062.
  64. Predicting elections with twitter: What 140 characters reveal about political sentiment. Proceedings of the International AAAI Conference on Web and Social Media, 4(1):178–185.
  65. Manifesto Project Database.
  66. Steven W. Webster and Bethany Albertson. 2022. Emotion and Politics: Noncognitive Psychological Biases in Public Opinion. Annual Review of Political Science, 25(1):401–418. _eprint: https://doi.org/10.1146/annurev-polisci-051120-105353.
  67. Tobias Widmann. 2021. How Emotional Are Populists Really? Factors Explaining Emotional Appeals in the Communication of Political Parties. Political Psychology, 42(1):163–181. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/pops.12693.
  68. Lori Young and Stuart Soroka. 2012. Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2):205–231.
Citations (3)

Summary

We haven't generated a summary for this paper yet.