Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue (2402.03658v1)
Abstract: Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (i.e., utterance, video, and audio). Although existing studies have achieved great success based on the generative pretrained LLM BART, they overlook exploiting the sentiments residing in the utterance, video and audio, which are vital clues for sarcasm explanation. In fact, it is non-trivial to incorporate sentiments for boosting SED performance, due to three main challenges: 1) diverse effects of utterance tokens on sentiments; 2) gap between video-audio sentiment signals and the embedding space of BART; and 3) various relations among utterances, utterance sentiments, and video-audio sentiments. To tackle these challenges, we propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE. In particular, we first propose a lexicon-guided utterance sentiment inference module, where a heuristic utterance sentiment refinement strategy is devised. We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip. Thereafter, we devise a context-sentiment graph to comprehensively model the semantic relations among the utterances, utterance sentiments, and video-audio sentiments, to facilitate sarcasm explanation generation. Extensive experiments on the publicly released dataset WITS verify the superiority of our model over cutting-edge methods.
- S. Kumar, A. Kulkarni, M. S. Akhtar, and T. Chakraborty, “When did you become so smart, oh wise one?! sarcasm explanation in multi-modal multi-party dialogues,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2022, pp. 5956–5968.
- G. Abercrombie and D. Hovy, “Putting sarcasm detection into context: The effects of class imbalance and manual labelling on supervised machine classification of twitter conversations,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2016, pp. 107–113.
- N. Babanejad, H. Davoudi, A. An, and M. Papagelis, “Affective and contextual embedding for sarcasm detection,” in Proceedings of the International Conference on Computational Linguistics. ICCL, 2020, pp. 225–243.
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2019, pp. 4171–4186.
- T. Chakrabarty, D. Ghosh, S. Muresan, and N. Peng, “R^3: Reverse, retrieve, and rank for sarcasm generation with commonsense knowledge,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2020, pp. 7976–7986.
- P. Desai, T. Chakraborty, and M. S. Akhtar, “Nice perfume. how long did you marinate in it? multimodal sarcasm explanation,” in AAAI Conference on Artificial Intelligence. AAAI Press, 2022, pp. 10 563–10 571.
- L. Jing, X. Song, K. Ouyang, M. Jia, and L. Nie, “Multi-source semantic graph-based multimodal sarcasm explanation generation,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2023, pp. 11 349–11 361.
- M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2020, pp. 7871–7880.
- A. Ray, S. Mishra, A. Nunna, and P. Bhattacharyya, “A multimodal corpus for emotion recognition in sarcasm,” in Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, 2022, pp. 6992–7003.
- D. Vilares, H. Peng, R. Satapathy, and E. Cambria, “Babelsenticnet: A commonsense reasoning framework for multilingual sentiment analysis,” in Symposium Series on Computational Intelligence. IEEE, 2018, pp. 1292–1298.
- R. G. Praveen, W. C. de Melo, N. Ullah, H. Aslam, O. Zeeshan, T. Denorme, M. Pedersoli, A. L. Koerich, S. Bacon, P. Cardinal, and E. Granger, “A joint cross-attention model for audio-visual fusion in dimensional emotion recognition,” in Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2022, pp. 2485–2494.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations. OpenReview.net, 2017.
- M. Bouazizi and T. Ohtsuki, “A pattern-based approach for sarcasm detection on twitter,” IEEE Access, vol. 4, pp. 5477–5488, 2016.
- B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann, “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 2017, pp. 1615–1625.
- Y. Tay, A. T. Luu, S. C. Hui, and J. Su, “Reasoning with sarcasm by reading in-between,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2018, pp. 1010–1020.
- R. Schifanella, P. de Juan, J. R. Tetreault, and L. Cao, “Detecting sarcasm in multimodal social platforms,” in Proceedings of the Conference on Multimedia Conference. ACM, 2016, pp. 1136–1145.
- L. Ma, Z. Lu, L. Shang, and H. Li, “Multimodal convolutional neural networks for matching image and sentence,” in International Conference on Computer Vision. IEEE, 2015, pp. 2623–2631.
- A. Pentland, “Socially aware media,” in Proceedings of the Conference on Multimedia Conference. ACM, 2005, pp. 690–695.
- Y. Qiao, L. Jing, X. Song, X. Chen, L. Zhu, and L. Nie, “Mutual-enhanced incongruity learning network for multi-modal sarcasm detection,” in AAAI Conference on Artificial Intelligence. AAAI Press, 2023, pp. 9507–9515.
- M. Jia, C. Xie, and L. Jing, “Debiasing multimodal sarcasm detection with contrastive learning,” in AAAI Conference on Artificial Intelligence. AAAI Press, 2024, pp. 1–10.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Annual Conference on Neural Information Processing Systems. Neural Information Processing Systems, 2017, pp. 5998–6008.
- S. Castro, D. Hazarika, V. Pérez-Rosas, R. Zimmermann, R. Mihalcea, and S. Poria, “Towards multimodal sarcasm detection (an _obviously_ perfect paper),” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2019, pp. 4619–4629.
- M. K. Hasan, S. Lee, W. Rahman, A. Zadeh, R. Mihalcea, L. Morency, and E. Hoque, “Humor knowledge enriched transformer for understanding multimodal humor,” in AAAI Conference on Artificial Intelligence. AAAI Press, 2021, pp. 12 972–12 980.
- L. Peled and R. Reichart, “Sarcasm SIGN: interpreting sarcasm with sentiment based monolingual machine translation,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2017, pp. 1690–1700.
- A. Dubey, A. Joshi, and P. Bhattacharyya, “Deep models for converting sarcastic utterances into their non sarcastic interpretation,” in Proceedings of the India Joint International Conference on Data Science and Management of Data. ACM, 2019, pp. 289–292.
- S. Kumar, I. Mondal, M. S. Akhtar, and T. Chakraborty, “Explaining (sarcastic) utterances to enhance affect understanding in multimodal dialogues,” in AAAI Conference on Artificial Intelligence. AAAI Press, 2023, pp. 12 986–12 994.
- S. Baccianella, A. Esuli, and F. Sebastiani, “Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining,” in Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association, 2010.
- S. Zhang, S. Zhang, T. Huang, and W. Gao, “Multimodal deep convolutional neural network for audio-visual emotion recognition,” in Proceedings of the ACM on International Conference on Multimedia Retrieval. ACM, 2016, pp. 281–284.
- W. Nie, M. Ren, J. Nie, and S. Zhao, “C-GCN: correlation based graph convolutional network for audio-video emotion recognition,” IEEE Transactions on Multimedia, pp. 3793–3804, 2021.
- D. Wang, S. Liu, Q. Wang, Y. Tian, L. He, and X. Gao, “Cross-modal enhancement network for multimodal sentiment analysis,” IEEE Transactions on Multimedia, pp. 4909–4921, 2023.
- R. Lin and H. Hu, “Dynamically shifting multimodal representations via hybrid-modal attention for multimodal sentiment analysis,” IEEE Transactions on Multimedia, pp. 1–16, 2023.
- D. Wang, S. Liu, Q. Wang, Y. Tian, L. He, and X. Gao, “Cross-modal enhancement network for multimodal sentiment analysis,” IEEE Transactions on Multimedia, vol. 25, pp. 4909–4921, 2023.
- W. Nie, R. Chang, M. Ren, Y. Su, and A. Liu, “I-GCN: incremental graph convolution network for conversation emotion detection,” IEEE Transactions on Multimedia, vol. 24, pp. 4471–4481, 2022.
- J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics dataset,” in Conference on Computer Vision and Pattern Recognition. IEEE, 2017, pp. 4724–4733.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conference on Computer Vision and Pattern Recognition. IEEE, 2016, pp. 770–778.
- G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush, “Opennmt: Open-source toolkit for neural machine translation,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, M. Bansal and H. Ji, Eds. ACL, 2017, pp. 67–72.
- A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2017, pp. 1073–1083.
- Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettlemoyer, “Multilingual denoising pre-training for neural machine translation,” Trans. Assoc. Comput. Linguistics, vol. 8, pp. 726–742, 2020.
- I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” CoRR, vol. abs/1711.05101, 2017.
- K. Papineni, S. Roukos, T. Ward, and W. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2002, pp. 311–318.
- C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics. ACL, 2004, pp. 74–81.
- T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating text generation with BERT,” in International Conference on Learning Representations. OpenReview.net, 2020.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, 1997.