Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR (2407.04444v2)

Published 5 Jul 2024 in cs.CL, cs.SD, and eess.AS

Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achieved by integrating task-specific tokens into the reference text during ASR model training, streamlining the inference and eliminating the need for separate NLP models. In addition to ASR, we conduct experiments on 3 different tasks: speaker change detection, endpointing, and NER. Our experiments on a public and a private dataset show that the proposed method improves ASR by up to 7.7% in relative WER while outperforming the cascaded pipeline approach in individual task performance. Our code is publicly available: https://github.com/idiap/tokenverse-unifying-speech-nlp

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Shashi Kumar (13 papers)
  2. Srikanth Madikeri (19 papers)
  3. Juan Zuluaga-Gomez (27 papers)
  4. Sergio Burdisso (13 papers)
  5. Petr Motlicek (40 papers)
  6. Karthik Pandia (4 papers)
  7. Aravind Ganapathiraju (13 papers)
  8. Iuliia Thorbecke (5 papers)
  9. Esaú Villatoro-Tello (19 papers)

Summary

We haven't generated a summary for this paper yet.