Holistic Visual-Textual Sentiment Analysis with Prior Models (2211.12981v2)

Published 23 Nov 2022 in cs.CV and cs.MM

Abstract: Visual-textual sentiment analysis aims to predict sentiment with the input of a pair of image and text, which poses a challenge in learning effective features for diverse input images. To address this, we propose a holistic method that achieves robust visual-textual sentiment analysis by exploiting a rich set of powerful pre-trained visual and textual prior models. The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions. Extensive experiments on three datasets show that our method produces better visual-textual sentiment analysis performance than existing methods.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Junyu Chen (52 papers)
Jie An (36 papers)
Hanjia Lyu (53 papers)
Jiebo Luo (355 papers)
Christopher Kanan (72 papers)

Holistic Visual-Textual Sentiment Analysis with Prior Models (2211.12981v2)

Related Papers