Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Matters: A New Dataset and Empirical Study for Multimodal Hyperbole Detection (2307.00209v3)

Published 1 Jul 2023 in cs.CV, cs.AI, and cs.CL

Abstract: Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of which focus on text modality only. However, with the development of social media, people can create hyperbolic expressions with various modalities, including text, images, videos, etc. In this paper, we focus on multimodal hyperbole detection. We create a multimodal detection dataset from Weibo (a Chinese social media) and carry out some studies on it. We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection. Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance. Besides, since this dataset is constructed from five different topics, we also evaluate the cross-domain performance of different models. These studies can serve as a benchmark and point out the direction of further study on multimodal hyperbole detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Harnessing privileged information for hyperbole detection. In Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association, pages 58–67, Online. Australasian Language Technology Association.
  2. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, page 2506–2515. Association for Computational Linguistics.
  3. Towards multimodal sarcasm detection (an _obviously_ perfect paper). In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Florence, Italy. Association for Computational Linguistics.
  4. Revisiting pre-trained models for Chinese natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 657–668, Online. Association for Computational Linguistics.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Human Language Technology: Conference of the North American Chapter of the Association of Computational Linguistics, page 4171–4186.
  6. Gated attention fusion network for multimodal sentiment classification. Knowledge-Based Systems, 240.
  7. Gaëlle Ferré. 2014. Multimodal hyperbole. Multimodal Communication, 3(1):25–50.
  8. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738.
  9. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer.
  10. Wenlan: Bridging vision and language by large-scale multi-modal pre-training. CoRR, abs/2103.06561.
  11. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  12. Identifying exaggerated language. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, page 7024–7034. Association for Computational Linguistics.
  13. Figurative language occurrence and co-occurrence in contemporary literature. In Empirical Approaches to Literature and Aesthetics, pages 83–87, Norwood, NJ. Ablex Publishing Corporation.
  14. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 4707–4715, New York, NY, USA. Association for Computing Machinery.
  15. Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement. arXiv preprint arXiv:2210.03501.
  16. Laura Cano Mora. 2009. All or nothing: A semantic analysis of hyperbole. Journal of Language and Applied Language, 4:25–35.
  17. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR.
  18. A computational exploration of exaggeration. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, page 3296–3304. Association for Computational Linguistics.
  19. Transmodality: An end2end fusion method with transformer for multimodal sentiment analysis. In Proceedings of The Web Conference 2020, pages 2514–2520.
  20. Modeling incongruity between modalities for multimodal sarcasm detection. IEEE MultiMedia, 28(2):86–95.
  21. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3777–3786, Online. Association for Computational Linguistics.
  22. Multi-level attention map network for multimodal sentiment analysis. IEEE Transactions on Knowledge and Data Engineering.
  23. Chinese clip: Contrastive vision-language pretraining in chinese. arXiv preprint arXiv:2211.01335.
  24. Multimodal sentiment detection based on multi-channel graph neural networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 328–339. Association for Computational Linguistics.
  25. Yiming Yang and Jan O Pedersen. 1997. A comparative study on feature selection in text categorization. In Icml, volume 97, page 35. Citeseer.

Summary

We haven't generated a summary for this paper yet.