Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lla-VAP: LSTM Ensemble of Llama and VAP for Turn-Taking Prediction (2412.18061v1)

Published 24 Dec 2024 in cs.SD, cs.CL, cs.HC, and eess.AS

Abstract: Turn-taking prediction is the task of anticipating when the speaker in a conversation will yield their turn to another speaker to begin speaking. This project expands on existing strategies for turn-taking prediction by employing a multi-modal ensemble approach that integrates LLMs and voice activity projection (VAP) models. By combining the linguistic capabilities of LLMs with the temporal precision of VAP models, we aim to improve the accuracy and efficiency of identifying TRPs in both scripted and unscripted conversational scenarios. Our methods are evaluated on the In-Conversation Corpus (ICC) and Coached Conversational Preference Elicitation (CCPE) datasets, highlighting the strengths and limitations of current models while proposing a potentially more robust framework for enhanced prediction.

Summary

We haven't generated a summary for this paper yet.