Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning (2311.17435v1)

Published 29 Nov 2023 in cs.CV and cs.AI

Abstract: We present MM-Narrator, a novel system leveraging GPT-4 with multimodal in-context learning for the generation of audio descriptions (AD). Unlike previous methods that primarily focused on downstream fine-tuning with short video clips, MM-Narrator excels in generating precise audio descriptions for videos of extensive lengths, even beyond hours, in an autoregressive manner. This capability is made possible by the proposed memory-augmented generation process, which effectively utilizes both the short-term textual context and long-term visual memory through an efficient register-and-recall mechanism. These contextual memories compile pertinent past information, including storylines and character identities, ensuring an accurate tracking and depicting of story-coherent and character-centric audio descriptions. Maintaining the training-free design of MM-Narrator, we further propose a complexity-based demonstration selection strategy to largely enhance its multi-step reasoning capability via few-shot multimodal in-context learning (MM-ICL). Experimental results on MAD-eval dataset demonstrate that MM-Narrator consistently outperforms both the existing fine-tuning-based approaches and LLM-based approaches in most scenarios, as measured by standard evaluation metrics. Additionally, we introduce the first segment-based evaluator for recurrent text generation. Empowered by GPT-4, this evaluator comprehensively reasons and marks AD generation performance in various extendable dimensions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chaoyi Zhang (51 papers)
  2. Kevin Lin (98 papers)
  3. Zhengyuan Yang (86 papers)
  4. Jianfeng Wang (149 papers)
  5. Linjie Li (89 papers)
  6. Chung-Ching Lin (36 papers)
  7. Zicheng Liu (153 papers)
  8. Lijuan Wang (133 papers)
Citations (15)
Github Logo Streamline Icon: https://streamlinehq.com