Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration (2011.03706v1)

Published 7 Nov 2020 in eess.AS and cs.SD

Abstract: We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation).It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Chenda Li (21 papers)
  2. Jing Shi (123 papers)
  3. Wangyou Zhang (35 papers)
  4. Aswin Shanmugam Subramanian (20 papers)
  5. Xuankai Chang (61 papers)
  6. Naoyuki Kamo (13 papers)
  7. Moto Hira (6 papers)
  8. Tomoki Hayashi (42 papers)
  9. Christoph Boeddeker (36 papers)
  10. Zhuo Chen (319 papers)
  11. Shinji Watanabe (416 papers)
Citations (77)

Summary

We haven't generated a summary for this paper yet.