Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts
Detailed Answer
Thorough responses based on abstracts and some paper content
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
74 tokens/sec
Gemini 2.5 Pro Pro
62 tokens/sec
o3 Pro
18 tokens/sec
GPT-4.1 Pro
74 tokens/sec
DeepSeek R1 via Azure Pro
24 tokens/sec
2000 character limit reached

Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection (2008.03405v1)

Published 8 Aug 2020 in eess.AS and cs.SD

Abstract: We propose a stacked 1D convolutional neural network (S1DCNN) for end-to-end small footprint voice trigger detection in a streaming scenario. Voice trigger detection is an important speech application, with which users can activate their devices by simply saying a keyword or phrase. Due to privacy and latency reasons, a voice trigger detection system should run on an always-on processor on device. Therefore, having small memory and compute cost is crucial for a voice trigger detection system. Recently, singular value decomposition filters (SVDFs) has been used for end-to-end voice trigger detection. The SVDFs approximate a fully-connected layer with a low rank approximation, which reduces the number of model parameters. In this work, we propose S1DCNN as an alternative approach for end-to-end small-footprint voice trigger detection. An S1DCNN layer consists of a 1D convolution layer followed by a depth-wise 1D convolution layer. We show that the SVDF can be expressed as a special case of the S1DCNN layer. Experimental results show that the S1DCNN achieve 19.0% relative false reject ratio (FRR) reduction with a similar model size and a similar time delay compared to the SVDF. By using longer time delays, the S1DCNN further improve the FRR up to 12.2% relative.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Takuya Higuchi (26 papers)
  2. Mohammad Ghasemzadeh (3 papers)
  3. Kisun You (1 paper)
  4. Chandra Dhir (10 papers)
Citations (17)