Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Zero-shot Audio Topic Reranking using Large Language Models (2309.07606v2)

Published 14 Sep 2023 in cs.CL and cs.IR

Abstract: Multimodal Video Search by Examples (MVSE) investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and emotion. A key element for this process is highly rapid and flexible search to support large archives, which in MVSE is facilitated by representing video attributes with embeddings. This work aims to compensate for any performance loss from this rapid archive search by examining reranking approaches. In particular, zero-shot reranking methods using LLMs are investigated as these are applicable to any video archive audio content. Performance is evaluated for topic-based retrieval on a publicly available video archive, the BBC Rewind corpus. Results demonstrate that reranking significantly improves retrieval ranking without requiring any task-specific in-domain training data. Furthermore, three sources of information (ASR transcriptions, automatic summaries and synopses) as input for LLM reranking were compared. To gain a deeper understanding and further insights into the performance differences and limitations of these text sources, we employ a fact-checking approach to analyse the information consistency among them.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets