Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models (2411.00023v2)

Published 28 Oct 2024 in eess.AS, cs.AI, cs.CL, and cs.SD

Abstract: Follow-up conversations with virtual assistants (VAs) enable a user to seamlessly interact with a VA without the need to repeatedly invoke it using a keyword (after the first query). Therefore, accurate Device-directed Speech Detection (DDSD) from the follow-up queries is critical for enabling naturalistic user experience. To this end, we explore the notion of LLMs and model the first query when making inference about the follow-ups (based on the ASR-decoded text), via prompting of a pretrained LLM, or by adapting a binary classifier on top of the LLM. In doing so, we also exploit the ASR uncertainty when designing the LLM prompts. We show on the real-world dataset of follow-up conversations that this approach yields large gains (20-40% reduction in false alarms at 10% fixed false rejects) due to the joint modeling of the previous speech context and ASR uncertainty, compared to when follow-ups are modeled alone.

References (19)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (10)

Tweets

https://twitter.com/AudioAndSpeech/status/1853330149527162907

Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models (2411.00023v2)

Summary

Follow-up Questions

Related Papers

Authors (10)

Tweets

Don't miss out on important new AI/ML research