2000 character limit reached
Turning Whisper into Real-Time Transcription System (2307.14743v2)
Published 27 Jul 2023 in cs.CL
Abstract: Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real time transcription. In this paper, we build on top of Whisper and create Whisper-Streaming, an implementation of real-time speech transcription and translation of Whisper-like models. Whisper-Streaming uses local agreement policy with self-adaptive latency to enable streaming transcription. We show that Whisper-Streaming achieves high quality and 3.3 seconds latency on unsegmented long-form speech transcription test set, and we demonstrate its robustness and practical usability as a component in live transcription service at a multilingual conference.
- ELITR: European live translator. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 463–464, Lisboa, Portugal. European Association for Machine Translation.
- ELITR multilingual live subtitling: Demo and strategy. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 271–277, Online. Association for Computational Linguistics.
- Operating a complex SLT system with speakers and human interpreters. In Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW), pages 23–34, Virtual. Association for Machine Translation in the Americas.
- Learning when to translate for streaming speech. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 680–694, Dublin, Ireland. Association for Computational Linguistics.
- Removing European language barriers with innovative machine translation technology. In Proceedings of the 1st International Workshop on Language Technology Platforms, pages 44–49, Marseille, France. European Language Resources Association.
- Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177–180, Prague, Czech Republic. Association for Computational Linguistics.
- Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection. In Proc. Interspeech 2020, pages 3620–3624.
- STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3025–3036, Florence, Italy. Association for Computational Linguistics.
- Lost in Interpreting: Speech Translation from Source or Interpreter? In Proc. Interspeech 2021, pages 2376–2380.
- CUNI-KIT system for simultaneous speech translation task at IWSLT 2022. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 277–285, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
- Scaling speech technology to 1,000+ languages. arXiv.
- Robust speech recognition via large-scale weak supervision.
- Dominik Macháček (16 papers)
- Raj Dabre (65 papers)
- Ondřej Bojar (91 papers)