Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification (2309.13734v2)
Abstract: Stance classification, the task of predicting the viewpoint of an author on a subject of interest, has long been a focal point of research in domains ranging from social science to machine learning. Current stance detection methods rely predominantly on manual annotation of sentences, followed by training a supervised machine learning model. However, this manual annotation process requires laborious annotation effort, and thus hampers its potential to generalize across different contexts. In this work, we investigate the use of LLMs as a stance detection methodology that can reduce or even eliminate the need for manual annotations. We investigate 10 open-source models and 7 prompting schemes, finding that LLMs are competitive with in-domain supervised models but are not necessarily consistent in their performance. We also fine-tuned the LLMs, but discovered that fine-tuning process does not necessarily lead to better performance. In general, we discover that LLMs do not routinely outperform their smaller supervised machine learning models, and thus call for stance detection to be a benchmark for which LLMs also optimize for. The code used in this study is available at \url{https://github.com/ijcruic/LLM-Stance-Labeling}
- Can we trust the evaluation on ChatGPT? arXiv preprint arXiv:2303.12767 (2023).
- Abeer AlDayel and Walid Magdy. 2021. Stance detection on social media: State of the art and trends. Information Processing & Management 58, 4 (2021), 102597.
- Abeer Aldayel and Walid Magdy. 2022. Characterizing the role of bots’ in polarized stance on social media. Social Network Analysis and Mining 12, 1 (2022), 30.
- Rabab Alkhalifa and Arkaitz Zubiaga. 2022. Capturing stance dynamics in social media: open challenges and research directions. International Journal of Digital Humanities 3, 1-3 (2022), 115–135.
- Emily Allaway and Kathleen McKeown. 2020. Zero-shot stance detection: A dataset and model using generalized topic representations. arXiv preprint arXiv:2010.03640 (2020).
- Emily Allaway and Kathleen McKeown. 2023. Zero-shot stance detection: Paradigms and challenges. Frontiers in Artificial Intelligence 5 (2023), 1070429.
- A systematic review of machine learning techniques for stance detection and its applications. Neural Computing and Applications 35, 7 (2023), 5113–5144.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- When do you need Chain-of-Thought Prompting for ChatGPT? arXiv preprint arXiv:2304.03262 (2023).
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
- Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1715–1724.
- Unsupervised user stance detection on Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 141–152.
- Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence 5, 3 (2023), 220–235.
- Heba Elfardy and Mona Diab. 2016. Cu-gwu perspective at semeval-2016 task 6: Ideological stance detection in informal text. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 434–439.
- Reasoning Implicit Sentiment with Chain-of-Thought Prompting. arXiv preprint arXiv:2305.11255 (2023).
- Textbooks Are All You Need. arXiv preprint arXiv:2306.11644 (2023).
- COVIDLies: Detecting COVID-19 Misinformation on Social Media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Mistral 7B. arXiv preprint arXiv:2310.06825 (2023).
- Kornraphop Kawintiranon and Lisa Singh. 2021. Knowledge Enhanced Masked Language Model for Stance Detection. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4725–4735.
- Kiana Kheiri and Hamid Karimi. 2023. SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning. arXiv preprint arXiv:2307.10234 (2023).
- All-in-one: Multi-task Learning for Rumour Verification. In Proceedings of the 27th International Conference on Computational Linguistics. 3402–3413.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054 (2022).
- Stance evolution and twitter interactions in an italian political debate. In International Conference on Applications of Natural Language to Information Systems. Springer, 15–27.
- Stance detection with collaborative role-infused llm-based agents. arXiv preprint arXiv:2310.10467 (2023).
- Stance Detection on Social Media with Background Knowledge. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 15703–15717.
- GPT-4 as a Twitter Data Annotator: Unraveling Its Performance on a Stance Classification Task. Authorea Preprints (2023).
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021).
- Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media. arXiv preprint arXiv:2305.13047 (2023).
- Mixed precision training. arXiv preprint arXiv:1710.03740 (2017).
- Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016). 31–41.
- Akiko Murakami and Rudy Raymond. 2010. Support or oppose? Classifying positions in online debates from reply activities and opinion expressions. In Coling 2010: Posters. 869–875.
- Lynnette Hui Xian Ng and Kathleen M Carley. 2022a. Is my stance the same as your stance? A cross validation study of stance detection datasets. Information Processing & Management 59, 6 (2022), 103070.
- Lynnette Hui Xian Ng and Kathleen M Carley. 2022b. Pro or Anti? a social influence model of online stance flipping. IEEE Transactions on Network Science and Engineering 10, 1 (2022), 3–19.
- OpenAI. 2023. GPT-4 Technical Report. https://cdn.openai.com/papers/gpt-4.pdf. [Accessed 27-02-2024].
- STEM: unsupervised structural embedding for stance detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11174–11182.
- Sunil Ramlochan. 2023. What is Prompt Engineering? https://www.promptengineering.org/what-is-prompt-engineering/. Accessed: 2023-09-14.
- Data programming: Creating large training sets, quickly. Advances in neural information processing systems 29 (2016).
- Inside the Secret List of Websites that make AI like ChatGPT Sound Smart. https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/
- Cataloging Prompt Patterns to Enhance the Discipline of Prompt Engineering. ([n. d.]).
- Noam Shazeer. 2019. Fast transformer decoding: One write-head is all you need. arXiv preprint arXiv:1911.02150 (2019).
- A dataset for multi-target stance detection. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 551–557.
- Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA. arXiv preprint arXiv:2402.06549 (2024).
- Unifying language learning paradigms. arXiv preprint arXiv:2205.05131 (2022).
- Stance and Hate Event Detection in Tweets Related to Climate Activism - Shared Task at CASE 2024. In Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- A Multi-task Model for Sentiment Aided Stance Detection of Climate Change Tweets. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 17. 854–865.
- Stance in replies and quotes (srq): A new dataset for learning stance in twitter conversations. arXiv preprint arXiv:2006.00691 (2020).
- Secular vs. Islamist Polarization in Egypt on Twitter. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Niagara, Ontario, Canada) (ASONAM ’13). Association for Computing Machinery, New York, NY, USA, 290–297. https://doi.org/10.1145/2492517.2492557
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846 (2023).
- pkudblab at semeval-2016 task 6: A specific convolutional neural network system for effective stance detection. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016). 384–388.
- A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023).
- UniLog: Automatic Logging via LLM and In-Context Learning. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12.
- BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation. In Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 1090–1096. https://doi.org/10.18653/v1/S19-2191
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).
- Evaluating instruction-tuned large language models on code comprehension and generation. arXiv preprint arXiv:2308.01240 (2023).
- Guido Zarrella and Amy Marsh. 2016. MITRE at SemEval-2016 Task 6: Transfer Learning for Stance Detection. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 458–463.
- How would stance detection techniques evolve after the launch of chatgpt? arXiv preprint arXiv:2212.14548 (2022).
- A Logically Consistent Chain-of-Thought Approach for Stance Detection. arXiv preprint arXiv:2312.16054 (2023).
- Investigating Chain-of-thought with ChatGPT for Stance Detection on Social Media. arXiv preprint arXiv:2304.03087 (2023).
- When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method. In The Twelfth International Conference on Learning Representations.
- DoubleH: Twitter User Stance Detection via Bipartite Graph Neural Networks. arXiv preprint arXiv:2301.08774 (2023).
- Guangzhen Zhao and Peng Yang. 2020. Pretrained embeddings for stance detection with hierarchical capsule network on social media. ACM Transactions on Information Systems (TOIS) 39, 1 (2020), 1–32.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning. PMLR, 12697–12706.
- Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL]
- Iain J. Cruickshank (10 papers)
- Lynnette Hui Xian Ng (47 papers)