Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Directional Source Separation for Robust Speech Recognition on Smart Glasses (2309.10993v1)

Published 20 Sep 2023 in cs.SD, cs.HC, and eess.AS

Abstract: Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality, this work investigates directional source separation using the multi-microphone array. We first explore multiple beamformers to assist source separation modeling by strengthening the directional properties of speech signals. In addition to relying on predetermined beamformers, we investigate neural beamforming in multi-channel source separation, demonstrating that automatic learning directional characteristics effectively improves separation quality. We further compare the ASR performance leveraging separated outputs to noisy inputs. Our results show that directional source separation benefits ASR for the wearer but not for the conversation partner. Lastly, we perform the joint training of the directional source separation and ASR model, achieving the best overall ASR performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Tiantian Feng (61 papers)
  2. Ju Lin (9 papers)
  3. Yiteng Huang (12 papers)
  4. Weipeng He (6 papers)
  5. Kaustubh Kalgaonkar (6 papers)
  6. Niko Moritz (23 papers)
  7. Li Wan (40 papers)
  8. Xin Lei (22 papers)
  9. Ming Sun (146 papers)
  10. Frank Seide (16 papers)
Citations (4)