Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MHMS: Multimodal Hierarchical Multimedia Summarization (2204.03734v1)

Published 7 Apr 2022 in cs.CV, cs.CL, and cs.MM

Abstract: Multimedia summarization with multimodal output can play an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. In this work, we propose a multimodal hierarchical multimedia summarization (MHMS) framework by interacting visual and language domains to generate both video and textual summaries. Our MHMS method contains video and textual segmentation and summarization module, respectively. It formulates a cross-domain alignment objective with optimal transport distance which leverages cross-domain interaction to generate the representative keyframe and textual summary. We evaluated MHMS on three recent multimodal datasets and demonstrated the effectiveness of our method in producing high-quality multimodal summaries.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jielin Qiu (21 papers)
  2. Jiacheng Zhu (54 papers)
  3. Mengdi Xu (27 papers)
  4. Franck Dernoncourt (161 papers)
  5. Trung Bui (79 papers)
  6. Zhaowen Wang (55 papers)
  7. Bo Li (1107 papers)
  8. Ding Zhao (172 papers)
  9. Hailin Jin (53 papers)
Citations (12)