Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning (2310.07093v1)
Abstract: To advance argumentative stance prediction as a multimodal problem, the First Shared Task in Multimodal Argument Mining hosted stance prediction in crucial social topics of gun control and abortion. Our exploratory study attempts to evaluate the necessity of images for stance prediction in tweets and compare out-of-the-box text-based large-LLMs (LLM) in few-shot settings against fine-tuned unimodal and multimodal models. Our work suggests an ensemble of fine-tuned text-based LLMs (0.817 F1-score) outperforms both the multimodal (0.677 F1-score) and text-based few-shot prediction using a recent state-of-the-art LLM (0.550 F1-score). In addition to the differences in performance, our findings suggest that the multimodal models tend to perform better when image content is summarized as natural language over their native pixel structure and, using in-context examples improves few-shot performance of LLMs.
- A systematic review of machine learning techniques for stance detection and its applications. Neural Computing and Applications, 35(7):5113–5144.
- Language models are few-shot learners.
- Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
- Instructblip: Towards general-purpose vision-language models with instruction tuning.
- Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
- Accelerate: Training and inference at scale made simple, efficient and adaptable. https://github.com/huggingface/accelerate.
- Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
- Stance prediction for contemporary issues: Data and experiments. arXiv preprint arXiv:2006.00052.
- Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. arXiv preprint arXiv:2306.02561.
- Vilt: Vision-and-language transformer without convolution or region supervision. In International Conference on Machine Learning, pages 5583–5594. PMLR.
- Retrieval-augmented generation for knowledge-intensive nlp tasks.
- Roberta: A robustly optimized bert pretraining approach.
- Overview of ImageArg-2023: The first shared task in multimodal argument mining. In Proceedings of the 10th Workshop on Argument Mining, Online and in Singapore. Association for Computational Linguistics.
- Imagearg: A multi-modal tweet dataset for image persuasiveness mining. arXiv preprint arXiv:2209.06416.
- Ilya Loshchilov and Frank Hutter. 2017. Fixing weight decay regularization in adam. CoRR, abs/1711.05101.
- Univl: A unified video and language pre-training model for multimodal understanding and generation. arXiv preprint arXiv:2002.06353.
- Question decomposition improves the faithfulness of model-generated reasoning.
- Maud Reveilhac and Gerold Schneider. 2023. Replicable semi-supervised approaches to state-of-the-art stance detection of tweets. Information Processing & Management, 60(2):103199.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15638–15650.
- Parinaz Sobhani. 2017. Stance detection and analysis in social media. Ph.D. thesis, Universite d’Ottawa/University of Ottawa.
- Robert L. Thorndike. 1953. Who belongs in the family? Psychometrika, 18:267–276.
- Llama: Open and efficient foundation language models.
- A multi-task model for emotion and offensive aided stance detection of climate change tweets. Proceedings of the ACM Web Conference 2023.
- Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, 20:447 – 482.
- Huggingface’s transformers: State-of-the-art natural language processing.
- Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
- Tree of thoughts: Deliberate problem solving with large language models.
- Proceedings of grand challenge and workshop on human multimodal language (challenge-hml). In Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML).
- Multi-modal meta multi-task learning for social media rumor detection. IEEE Transactions on Multimedia, 24:1449–1459.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.