SoMeLVLM: A Large Vision Language Model for Social Media Processing (2402.13022v1)
Abstract: The growth of social media, characterized by its multimodal nature, has led to the emergence of diverse phenomena and challenges, which calls for an effective approach to uniformly solve automated tasks. The powerful Large Vision LLMs make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall short in aligning with the unique speaking style and context of social media tasks. In this paper, we introduce a Large Vision LLM for Social Media Processing (SoMeLVLM), which is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation. SoMeLVLM is designed to understand and generate realistic social media behavior. We have developed a 654k multimodal social media instruction-tuning dataset to support our cognitive framework and fine-tune our model. Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks. Further analysis shows its significant advantages over baselines in terms of cognitive abilities.
- Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems, volume 35, pages 23716–23736. Curran Associates, Inc.
- Emily Allaway and Kathleen McKeown. 2020. Zero-shot stance detection: A dataset and model using generalized topic representations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
- We can detect your bias: Predicting the political ideology of news articles. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4982–4991, Online. Association for Computational Linguistics.
- Disc-medllm: Bridging general large language models and real-world medical consultation.
- Benjamin S. Bloom and David R. Krathwohl. 1956. Taxonomy of educational objectives; the classification of educational goals by a committee of college and university examiners. Handbook I: Cognitive Domain. Longmans, Green, New York, NY.
- Modeling empathy and distress in reaction to news stories. In Conference on Empirical Methods in Natural Language Processing.
- FLUTE: Figurative language understanding through textual explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7139–7159, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Disc-finllm: A chinese financial large language model based on multiple experts fine-tuning.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Unifying vision-and-language tasks via text generation. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1931–1942. PMLR.
- Do llms understand social knowledge? evaluating the sociability of large language models with socket benchmark.
- Scaling instruction-finetuned language models.
- Toxic comment classification challenge.
- Instructblip: Towards general-purpose vision-language models with instruction tuning.
- Qlora: Efficient finetuning of quantized llms.
- Computational social science and sociology. Annual Review of Sociology, 46(1):61–81.
- Latent hatred: A benchmark for understanding implicit hate speech. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 345–363, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19358–19369.
- Facilitating the communication of politeness through fine-grained paraphrasing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5127–5140, Online. Association for Computational Linguistics.
- Misinfo reaction frames: Reasoning about readers’ reactions to news headlines. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3108–3127, Dublin, Ireland. Association for Computational Linguistics.
- Twitter sentiment classification using distant supervision. CS224N project report, Stanford, 1(12):2009.
- Scott A. Golder and Michael W. Macy. 2014. Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40(1):129–152.
- Exploring hate speech detection in multimodal publications. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1470–1478.
- Felipe González-Pizarro and Savvas Zannettou. 2022. Understanding and detecting hateful content using contrastive learning.
- Testing the etch-a-sketch hypothesis: a computational analysis of mitt romney’s ideological makeover during the 2012 primary vs. general elections. In APSA 2013 Annual Meeting Paper, American Political Science Association 2013 Annual Meeting.
- Does BERT learn as humans perceive? understanding linguistic styles through lexica. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6323–6331, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Avery Holton and Seth Lewis. 2011. Journalists, social media, and the use of humor on twitter. Electronic Journal of Communication, 21.
- SemEval-2020 task 7: Assessing humor in edited news headlines. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 746–758, Barcelona (online). International Committee for Computational Linguistics.
- Kornraphop Kawintiranon and Lisa Singh. 2021. Knowledge enhanced masked language model for stance detection. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
- Analytical sociology and computational social science. Journal of Computational Science.
- A large self-annotated corpus for sarcasm. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- The hateful memes challenge: Detecting hate speech in multimodal memes.
- Multimodal post attentive profiling for influencer marketing. In Proceedings of The Web Conference 2020, pages 2878–2884.
- Life in the network: The coming age of computational social science. 323.
- Computational social science: Obstacles and opportunities. Science, 369(6507):1060–1062.
- LAVIS: A one-stop library for language-vision intelligence. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 31–41, Toronto, Canada. Association for Computational Linguistics.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models.
- Mvptr: Multi-level semantic alignment for vision-language pre-training via multi-stage learning. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4395–4405.
- Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485.
- Valley: Video assistant with large language model enhanced ability. arXiv preprint arXiv:2306.07207.
- Gpt-4v (ision) as a social media analysis engine. arXiv preprint arXiv:2311.07547.
- SemEval 2021 task 7: HaHackathon, detecting and rating humor and offense. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 105–119, Online. Association for Computational Linguistics.
- Stance and sentiment in tweets.
- A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media, HT ’17, page 85–94, New York, NY, USA. Association for Computing Machinery.
- Unifying local and global knowledge: Empowering large language models as political experts with knowledge graphs. International World Wide Web Conference.
- Align voting behavior with public statements for legislator representation learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1236–1246, Online. Association for Computational Linguistics.
- UPPAM: A unified pre-training architecture for political actor modeling based on language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11996–12012, Toronto, Canada. Association for Computational Linguistics.
- Sentiment analysis on social media. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 919–926.
- OpenAI. 2023. ChatGPT. https://chat.openai.com/. Accessed: 2024-02-03.
- Overview of the 6th author profiling task at pan 2018: Multimodal gender identification in twitter. In Conference and Labs of the Evaluation Forum.
- Jiaxin Pei and David Jurgens. 2020. Quantifying intimacy in language. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5307–5326, Online. Association for Computational Linguistics.
- It takes two to lie: One to lie, and one to listen. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3811–3854, Online. Association for Computational Linguistics.
- Automatically identifying complaints in social media. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5008–5019, Florence, Italy. Association for Computational Linguistics.
- Automatically neutralizing subjective bias in text. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):480–489.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR.
- From humor recognition to irony detection: The figurative language of social media. Data & Knowledge Engineering, 74:1–12. Applications of Natural Language to Information Systems.
- CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687–3697, Brussels, Belgium. Association for Computational Linguistics.
- Big data, digital media, and computational social science: Possibilities and perils. The ANNALS of the American Academy of Political and Social Science, 659(1):6–13.
- Fakenewsnet: A data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint arXiv:1809.01286.
- Fake news detection on social media: A data mining perspective. SIGKDD Explor. Newsl., 19(1):22–36.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Multimodal few-shot learning with frozen language models. In Advances in Neural Information Processing Systems, volume 34, pages 200–212. Curran Associates, Inc.
- SemEval-2018 task 3: Irony detection in English tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 39–50, New Orleans, Louisiana. Association for Computational Linguistics.
- Introducing CAD: the contextual abuse dataset. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2289–2303, Online. Association for Computational Linguistics.
- Finetuned language models are zero-shot learners.
- Orion Weller and Kevin Seppi. 2019. Humor detection: A transformer gets the last laugh. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3621–3625, Hong Kong, China. Association for Computational Linguistics.
- Most users do not follow political elites on twitter; those who do show overwhelming preferences for ideological congruity. Science Advances, 8(39):eabn9418.
- Image-text multimodal emotion classification via multi-view attentional network. IEEE Transactions on Multimedia, 23:4014–4026.
- Disc-lawllm: Fine-tuning large language models for intelligent legal services.
- SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 75–86, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- Chatglm-6b fine-tuning for cultural and creative products advertising words. pages 291–295.
- Yunxiang Zhang and Xiaojun Wan. 2022. Mover: Mask, over-generate and rank for hyperbole generation.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.
- Can large language models transform computational social science?