Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction (2402.18107v2)

Published 28 Feb 2024 in cs.MM

Abstract: In line with the latest research, the task of identifying helpful reviews from a vast pool of user-generated textual and visual data has become a prominent area of study. Effective modal representations are expected to possess two key attributes: consistency and differentiation. Current methods designed for Multimodal Review Helpfulness Prediction (MRHP) face limitations in capturing distinctive information due to their reliance on uniform multimodal annotation. The process of adding varied multimodal annotations is not only time-consuming but also labor-intensive. To tackle these challenges, we propose an auto-generated scheme based on multi-task learning to generate pseudo labels. This approach allows us to simultaneously train for the global multimodal interaction task and the separate cross-modal interaction subtasks, enabling us to learn and leverage both consistency and differentiation effectively. Subsequently, experimental results validate the effectiveness of pseudo labels, and our approach surpasses previous textual and multimodal baseline models on two widely accessible benchmark datasets, providing a solution to the MRHP problem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Multimodal categorization of crisis events in social media. In CVPR. Computer Vision Foundation/IEEE, pages 14667–14677, 2020.
  2. Unsupervised domain clusters in pretrained language models. In Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, pages 7747–7763, 2020.
  3. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR. IEEE Computer Society, pages 6077–6086, 2018.
  4. Rich Caruana. Multitask learning: A knowledge-based source of inductive bias. In in Proc. 10th Int. Conf. Mach. Learn., pages 41–48, 1993.
  5. Review helpfulness prediction with embedding-gated CNN. arXiv preprint arXiv:1808.09896, 2018a.
  6. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In in Proc. 35th Int. Conf. Mach. Learn., pages 793–802, 2018b.
  7. A unified architecture for natural language processing: deep neural networks with multitask learning. In in Proc. 25th Int. Conf. Mach. Learn., pages 160–167, 2008.
  8. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proc. 11th ACM Int. Conf. Web Search Data Min., pages 126–134, 2018.
  9. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, pages 4171–4186, 2019.
  10. Modeling and prediction of online product review helpfulness: A survey. In Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, pages 698–708, 2018.
  11. Product-aware helpfulness prediction of online reviews. In WWW. ACM, pages 2715–2721, 2019.
  12. SANCL: multimodal review helpfulness prediction with selective attention and natural contrastive learning. In Proc. 29th Int. Conf. Comput. Linguistics, pages 5666–5677, 2022.
  13. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv preprint arXiv:1606.08415, 2016.
  14. IR evaluation methods for retrieving highly relevant documents. SIGIR Forum, 51(2):243–250, 2017.
  15. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In CVPR. IEEE Computer Society, pages 7482–7491, 2018.
  16. Automatically assessing review helpfulness. In Proc. Conf. Empir. Methods Natural Lang. Process, pages 423–430, 2006.
  17. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  18. Srikumar Krishnamoorthy. Linguistic features for review helpfulness prediction. Expert Syst. Appl., 42(7):3751–3759, 2015.
  19. Extracting opinion explanations from chinese online reviews. In 2012 IEEE International Conference on Intelligence and Security Informatics, pages 221–223, 2012.
  20. Using argument-based features to predict and analyse review helpfulness. In Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, pages 1358–1363, 2017.
  21. Multi-perspective coherent reasoning for helpfulness prediction of multimodal reviews. In Proc. 59th Annu. Meeting Assoc. Comput. Linguistics, pages 5927–5936, 2021.
  22. Prediction of helpful reviews using emotions extraction. In Proc. 28th AAAI Conf. Artif. Intell., pages 1551–1557, 2014.
  23. Uncertainty regularized multi-task learning. In WASSA@ACL 2022, pages 78–88, 2022.
  24. Cross-stitch networks for multi-task learning. In CVPR. IEEE Computer Society, pages 3994–4003, 2016.
  25. Adaptive contrastive learning on multimodal transformer for review helpfulness prediction. In Proc. Conf. Empir. Methods Natural Lang. Process, pages 10085–10096, 2022.
  26. Pareto local optimum sets in the biobjective traveling salesman problem: An experimental study. In Metaheuristics for Multiobjective Optimisation, pages 177–199. 2004.
  27. Latent multi-task architecture learning. In Proc. 33th AAAI Conf. Artif. Intell., pages 4822–4829, 2019.
  28. Multi-task learning as multi-objective optimization. In NIPS, pages 525–536, 2018.
  29. A sentiment analysis model for hotel reviews based on supervised learning. In ICMLC, pages 950–954, 2011.
  30. Refining activation downsampling with softpool. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit, pages 10337–10346, 2021.
  31. Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44(7):3614–3633, 2022.
  32. Attention is all you need. In NIPS, pages 5998–6008, 2017.
  33. Bilateral multi-perspective matching for natural language sentences. In Proc. Int. Joint Conf. Artif. Intell., pages 4144–4150, 2017.
  34. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, pages 3777–3786, 2020.
  35. Semantic analysis and helpfulness prediction of text for online product reviews. In Proc. 53th Annu. Meeting Assoc. Comput. Linguistics, pages 38–44, 2015.
  36. A novel method to improve transfer learning based on mahalanobis distance. In 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 2279–2283, 2017.
  37. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In AAAI, pages 10790–10797, 2021.
  38. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng., 34(12):5586–5609, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. HongLin Gong (1 paper)
  2. Mengzhao Jia (12 papers)
  3. Liqiang Jing (21 papers)