Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning (2002.00125v1)

Published 1 Feb 2020 in cs.SD, cs.LG, and eess.AS

Abstract: In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers and the higher classification layers were jointly trained using the logits from the teacher model. In our experiments, compared to a baseline model trained on about 600 hours of transcribed data, a relative word-error rate (WER) reduction of about 27.3% was achieved when using an additional 1800 hours of untranscribed data. We also investigated the benefit of pre-training the multi-channel front end to output the beamformed log-mel filter bank energies (LFBE) using L2 loss. We find that pre-training improves the word error rate by 10.7% when compared to a multi-channel model directly initialized with a beamformer and mel-filter bank coefficients for the front end. Finally, combining pre-training and teacher-student training produces a WER reduction of 31% compared to our baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sanna Wager (6 papers)
  2. Aparna Khare (12 papers)
  3. Minhua Wu (12 papers)
  4. Kenichi Kumatani (15 papers)
  5. Shiva Sundaram (13 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.