Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment (2210.04722v1)

Published 10 Oct 2022 in cs.CV

Abstract: Multimedia summarization with multimodal output (MSMO) is a recently explored application in language grounding. It plays an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. However, existing methods extract features from the whole video and article and use fusion methods to select the representative one, thus usually ignoring the critical structure and varying semantics. In this work, we propose a Semantics-Consistent Cross-domain Summarization (SCCS) model based on optimal transport alignment with visual and textual segmentation. In specific, our method first decomposes both video and article into segments in order to capture the structural semantics, respectively. Then SCCS follows a cross-domain alignment objective with optimal transport distance, which leverages multimodal interaction to match and select the visual and textual summary. We evaluated our method on three recent multimodal datasets and demonstrated the effectiveness of our method in producing high-quality multimodal summaries.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jielin Qiu (21 papers)
  2. Jiacheng Zhu (54 papers)
  3. Mengdi Xu (27 papers)
  4. Franck Dernoncourt (161 papers)
  5. Trung Bui (79 papers)
  6. Zhaowen Wang (55 papers)
  7. Bo Li (1107 papers)
  8. Ding Zhao (172 papers)
  9. Hailin Jin (53 papers)
Citations (11)