Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration (2402.03519v1)
Abstract: Punctuation restoration is a crucial step after Automatic Speech Recognition (ASR) systems to enhance transcript readability and facilitate subsequent NLP tasks. Nevertheless, conventional lexical-based approaches are inadequate for solving the punctuation restoration task in Spanish, where ambiguity can be often found between unpunctuated declaratives and questions. In this study, we propose a novel hybrid acoustic-lexical punctuation restoration system for Spanish transcription, which consolidates acoustic and lexical signals through a modular process. Our experiment results show that the proposed system can effectively improve F1 score of question marks and overall punctuation restoration on both public and internal Spanish conversational datasets. Additionally, benchmark comparison against LLMs (LLM) indicates the superiority of our approach in accuracy, reliability and latency. Furthermore, we demonstrate that the Word Error Rate (WER) of the ASR module also benefits from our proposed system.
- Common Voice: A Massively-Multilingual Speech Corpus.
- Meghan E. Armstrong. 2017. Accounting for intonational form and function in puerto rican spanish polar questions. Probus, 29(1):1–40.
- Esther Brown and Javier Rivas. 2011. Subject-verb word order in spanish interrogatives: A quantitative analysis of puerto rican spanish. Spanish in Context, 8.
- Alejandro Cuza. 2016. The status of interrogative subject–verb inversion in spanish-english bilingual children. Lingua, 180.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Timothy L. Face. 2005. F0 peak height and the perception of sentence type in castilian spanish. Revista Internacional de Lingüística Iberoamericana, 3:49–65.
- Improving punctuation restoration for speech transcripts via external data. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 168–174, Online. Association for Computational Linguistics.
- Autopunct: A BERT-based Automatic Punctuation and Capitalisation System for Spanish and Basque. Procesamiento del Lenguaje Natural, 67(0):59–68.
- Fisher Spanish - Transcripts LDC2010T04. Web Download. Philadelphia: Linguistic Data Consortium.
- Fisher Spanish Speech LDC2010S01. Web Download. Philadelphia: Linguistic Data Consortium.
- Restoring punctuation and capitalization in transcribed speech. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4741–4744.
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets. In ICML ’06: Proceedings of the International Conference on Machine Learning.
- Alex Graves. 2012. Sequence transduction with recurrent neural networks. CoRR, abs/1211.3711.
- Yushi Guan. 2020. End to End ASR System with Automatic Punctuation Insertion.
- Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech. In Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pages 6504–6513, Marseille, France. European Language Resources Association (ELRA).
- Conformer: Convolution-augmented Transformer for Speech Recognition.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural computation, 9:1735–80.
- Measuring the readability of automatic speech-to-text transcripts.
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization.
- Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5700–5704.
- MediaSpeech: Multilanguage ASR Benchmark and Dataset.
- Nemo: a toolkit for building ai applications using neural modules. arXiv preprint arXiv:1909.09577.
- Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning.
- Su Ar Lee and B.A.and M.A. 2010. Absolute interrogative intonation patterns in Buenos Aires Spanish.
- Xinxing Li and Edward Lin. 2020. A 43 Language Multilingual Punctuation Prediction Neural Network Model. In Proc. Interspeech 2020, pages 1067–1071.
- Pierre Lison and Jörg Tiedemann. 2016. OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 923–929, Portorož, Slovenia. European Language Resources Association (ELRA).
- Wei Lu and Hwee Tou Ng. 2010. Better Punctuation Prediction with Dynamic Conditional Random Fields. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 177–186, Cambridge, MA. Association for Computational Linguistics.
- Ian Mackenzie. 2021. The linguistics of spanish.
- SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition.
- Vasile Păi\textcommabelows and Dan Tufi\textcommabelows. 2021. Capitalization and punctuation restoration: a survey. Artificial Intelligence Review, 55:1681 – 1722.
- Is chatgpt a general-purpose natural language processing task solver?
- Chase Wesley Raymond. 2015. Questions and responses in spanish monolingual and spanish–english bilingual conversation. Language and Communication, 42:50–68.
- Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech.
- Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008.
- Erik Willis. 2007. Utterance signaling and tonal levels in dominican spanish declaratives and interrogatives. Journal of Portuguese Linguistics, 6:179.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Investigating LSTM for punctuation prediction. In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pages 1–5.
- Punctuation restoration in Spanish customer support transcripts using transfer learning. In Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 80–89, Hybrid. Association for Computational Linguistics.
- Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus.
- Xiliang Zhu (7 papers)
- Chia-Tien Chang (1 paper)
- Shayna Gardiner (6 papers)
- David Rossouw (4 papers)
- Jonas Robertson (3 papers)