Promptformer: Prompted Conformer Transducer for ASR (2401.07360v1)
Abstract: Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rWERR) over a strong baseline. We show that our method does not degrade in the absence of context and leads to improvements even if the model is trained without context. We further show that leveraging a pre-trained sentence-piece model for context embedding generation can outperform an external BERT model.
- Contextual-utterance training for automatic speech recognition. In iberSPEECH, 2022.
- Joint modelling of spoken language understanding tasks with integrated dialog history. In ICASSP, 2023.
- Dialog act guided contextual adapter for personalized speech recognition. In ICASSP, 2023.
- Context-aware transformer transducer for speech recognition. In ASRU, 2021.
- Context-aware end-to-end asr using self-attentive embedding and tensor fusion. In ICASSP, 2023.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, 2018.
- A. Graves. Sequence transduction with recurrent neural networks. In ICML, Edinburgh, Scotland, 2012.
- Conformer: Convolution-augmented transformer for speech recognition. ArXiv, 2005.08100, 2020.
- Hyperprompt: Prompt-based task-conditioning of transformers. In PMLR, 2022.
- Advanced long-context end-to-end speech recognition using context-expanded transformers. In Interspeech, 2021.
- Contextual rnn-t for open domain asr. arXiv preprint arXiv:2006.03411, 2020.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- On the sentence embeddings from pre-trained language models. In EMNLP. Association for Computational Linguistics, 2020.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In ACL, Dublin, Ireland, 2022.
- Hierarchical transformer-based large-context end-to-end asr with large-context knowledge distillation. In ICASSP, 2021.
- Procter: Pronunciation-aware contextual adapter for personalized speech recognition in neural transducers. In ICASSP, 2023.
- SpecAugment: A simple data augmentation method for automatic speech recognition. In Interspeech. ISCA, 2019.
- Deep context: end-to-end contextual speech recognition. In IEEE SLT, 2018.
- Contextual adapters for personalized speech recognition in neural transducers. In ICASSP, 2022.
- Towards end-to-end integration of dialog history for improved spoken language understanding. In ICASSP.
- Adaptive global-local context fusion for multi-turn spoken language understanding. AAAI, 2022.
- Improving transformer-based conversational asr by inter-sentential attention mechanism. arXiv preprint arXiv:2207.00883, 2022.
- Attentive contextual carryover for multi-turn end-to-end spoken language understanding. In ASRU, 2021.
- Pushing the limits of semi-supervised learning for automatic speech recognition. CoRR, 2020.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.