Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Convformer: Extending Conformer with Multiple Convolution Kernels (2407.03718v2)

Published 4 Jul 2024 in cs.CL, cs.AI, cs.LG, cs.SD, and eess.AS

Abstract: Convolutions have become essential in state-of-the-art end-to-end Automatic Speech Recognition~(ASR) systems due to their efficient modelling of local context. Notably, its use in Conformers has led to superior performance compared to vanilla Transformer-based ASR systems. While components other than the convolution module in the Conformer have been reexamined, altering the convolution module itself has been far less explored. Towards this, we introduce Multi-Convformer that uses multiple convolution kernels within the convolution module of the Conformer in conjunction with gating. This helps in improved modeling of local dependencies at varying granularities. Our model rivals existing Conformer variants such as CgMLP and E-Branchformer in performance, while being more parameter efficient. We empirically compare our approach with Conformer and its variants across four different datasets and three different modelling paradigms and show up to 8% relative word error rate~(WER) improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Darshan Prabhu (5 papers)
  2. Yifan Peng (147 papers)
  3. Preethi Jyothi (51 papers)
  4. Shinji Watanabe (416 papers)

Summary

We haven't generated a summary for this paper yet.