Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances (2203.16028v2)

Published 30 Mar 2022 in cs.CL, cs.MM, cs.SD, and eess.AS

Abstract: Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub (https://github.com/Sreyan88/Disfluency-Detection-with-Span-Classification)

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sreyan Ghosh (46 papers)
  2. Sonal Kumar (30 papers)
  3. Yaman Kumar Singla (12 papers)
  4. Rajiv Ratn Shah (108 papers)
  5. S. Umesh (24 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.