Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection (2007.09245v1)

Published 17 Jul 2020 in eess.AS and cs.SD

Abstract: In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM architecture. The streaming capability is achieved by using unidirectional LSTM layers and a causal mean aggregation layer to form the final utterance-level prediction up to the current frame. In order to avoid redundant computation during online streaming inference, we use a caching mechanism for every convolution operation. Experimental results on a device-directed vs. non device-directed task show that the proposed model yields an equal error rate reduction of 41% compared to our previous best model on this task. Furthermore, we show that the proposed model is able to accurately predict earlier in time compared to the attention-based models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xiaosu Tong (3 papers)
  2. Che-Wei Huang (8 papers)
  3. Sri Harish Mallidi (7 papers)
  4. Shaun Joseph (1 paper)
  5. Sonal Pareek (1 paper)
  6. Chander Chandak (6 papers)
  7. Ariya Rastrow (55 papers)
  8. Roland Maas (24 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.