Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using Large Language Model for End-to-End Chinese ASR and NER (2401.11382v2)

Published 21 Jan 2024 in cs.CL and cs.AI

Abstract: Mapping speech tokens to the same feature space as text tokens has become the paradigm for the integration of speech modality into decoder-only LLMs. An alternative approach is to use an encoder-decoder architecture that incorporates speech features through cross-attention. This approach, however, has received less attention in the literature. In this work, we connect the Whisper encoder with ChatGLM3 and provide in-depth comparisons of these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks. We evaluate them not only by conventional metrics like the F1 score but also by a novel fine-grained taxonomy of ASR-NER errors. Our experiments reveal that encoder-decoder architecture outperforms decoder-only architecture with a short context, while decoder-only architecture benefits from a long context as it fully exploits all layers of the LLM. By using LLM, we significantly reduced the entity omission errors and improved the entity ASR accuracy compared to the Conformer baseline. Additionally, we obtained a state-of-the-art (SOTA) F1 score of 0.805 on the AISHELL-NER test set by using chain-of-thought (CoT) NER which first infers long-form ASR transcriptions and then predicts NER labels.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yuang Li (18 papers)
  2. Jiawei Yu (33 papers)
  3. Yanqing Zhao (8 papers)
  4. Min Zhang (630 papers)
  5. Mengxin Ren (16 papers)
  6. Xiaofeng Zhao (22 papers)
  7. Hao Yang (328 papers)
  8. Shimin Tao (31 papers)
  9. Jinsong Su (96 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com