Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-shot Audio Topic Reranking using Large Language Models (2309.07606v2)

Published 14 Sep 2023 in cs.CL and cs.IR

Abstract: Multimodal Video Search by Examples (MVSE) investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and emotion. A key element for this process is highly rapid and flexible search to support large archives, which in MVSE is facilitated by representing video attributes with embeddings. This work aims to compensate for any performance loss from this rapid archive search by examining reranking approaches. In particular, zero-shot reranking methods using LLMs are investigated as these are applicable to any video archive audio content. Performance is evaluated for topic-based retrieval on a publicly available video archive, the BBC Rewind corpus. Results demonstrate that reranking significantly improves retrieval ranking without requiring any task-specific in-domain training data. Furthermore, three sources of information (ASR transcriptions, automatic summaries and synopses) as input for LLM reranking were compared. To gain a deeper understanding and further insights into the performance differences and limitations of these text sources, we employ a fact-checking approach to analyse the information consistency among them.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mengjie Qian (20 papers)
  2. Rao Ma (22 papers)
  3. Adian Liusie (20 papers)
  4. Erfan Loweimi (9 papers)
  5. Kate M. Knill (13 papers)
  6. Mark J. F. Gales (37 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets