Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR (2305.19584v1)

Published 31 May 2023 in cs.CL and eess.AS

Abstract: Building a multilingual Automated Speech Recognition (ASR) system in a linguistically diverse country like India can be a challenging task due to the differences in scripts and the limited availability of speech data. This problem can be solved by exploiting the fact that many of these languages are phonetically similar. These languages can be converted into a Common Label Set (CLS) by mapping similar sounds to common labels. In this paper, new approaches are explored and compared to improve the performance of CLS based multilingual ASR model. Specific language information is infused in the ASR model by giving Language ID or using CLS to Native script converter on top of the CLS Multilingual model. These methods give a significant improvement in Word Error Rate (WER) compared to the CLS baseline. These methods are further tried on out-of-distribution data to check their robustness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kaousheik Jayakumar (1 paper)
  2. Vrunda N. Sukhadia (6 papers)
  3. A Arunkumar (3 papers)
  4. S. Umesh (24 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.