Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation (2403.04178v1)

Published 7 Mar 2024 in cs.CL, cs.SD, and eess.AS

Abstract: The language diversity in India's education sector poses a significant challenge, hindering inclusivity. Despite the democratization of knowledge through online educational content, the dominance of English, as the internet's lingua franca, limits accessibility, emphasizing the crucial need for translation into Indian languages. Despite existing Speech-to-Speech Machine Translation (SSMT) technologies, the lack of intonation in these systems gives monotonous translations, leading to a loss of audience interest and disengagement from the content. To address this, our paper introduces a dataset with stress annotations in Indian English and also a Text-to-Speech (TTS) architecture capable of incorporating stress into synthesized speech. This dataset is used for training a stress detection model, which is then used in the SSMT system for detecting stress in the source speech and transferring it into the target language speech. The TTS architecture is based on FastPitch and can modify the variances based on stressed words given. We present an Indian English-to-Hindi SSMT system that can transfer stress and aim to enhance the overall quality and engagement of educational content.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. P.D. Aguero, J. Adell and A. Bonafonte “Prosody Generation for Speech-to-Speech Translation” In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings 1, 2006, pp. I–I DOI: 10.1109/ICASSP.2006.1660081
  2. “Resources for Indian languages”, 2016
  3. “WhisperX: Time-Accurate Speech Transcription of Long-Form Audio” In INTERSPEECH 2023, 2023
  4. “SMOTE: Synthetic Minority Over-sampling Technique” In Journal of Artificial Intelligence Research 16 AI Access Foundation, 2002, pp. 321–357 DOI: 10.1613/jair.953
  5. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, 2019 arXiv:1810.04805 [cs.CL]
  6. Quoc Truong Do, Sakriani Sakti and Satoshi Nakamura “Sequence-to-Sequence Models for Emphasis Speech Translation” In IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.10, 2018, pp. 1873–1883 DOI: 10.1109/TASLP.2018.2846402
  7. “Preserving Word-Level Emphasis in Speech-to-Speech Translation” In IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.3, 2017, pp. 544–556 DOI: 10.1109/TASLP.2016.2643280
  8. “Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models” In Proc. Interspeech 2016
  9. JL Fleiss “Measuring nominal scale agreement among many raters” In Psychological bulletin
  10. Hiroya Fujisaki “Information, prosody, and modeling-with emphasis on tonal features of speech” In Scientific Programming - SP, 2004
  11. “Towards a Database For Detection of Multiple Speech Disfluencies in Indian English” In 2021 National Conference on Communications (NCC), 2021, pp. 1–6 DOI: 10.1109/NCC52529.2021.9530043
  12. “A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation”, 2023 arXiv:2301.10606 [cs.CL]
  13. “SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings Association for Computational Linguistics
  14. “A method for translation of paralinguistic information” In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, 2012, pp. 158–163
  15. Adrian Łańcucki “Fastpitch: Parallel Text-to-Speech with Pitch Prediction” In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6588–6592 DOI: 10.1109/ICASSP39728.2021.9413889
  16. “VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages”, 2023 arXiv:2305.12518 [cs.CL]
  17. “Robust Speech Recognition via Large-Scale Weak Supervision”, 2022 arXiv:2212.04356 [eess.AS]
  18. “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions”, 2018 arXiv:1712.05884 [cs.CL]
  19. “OPUS-MT — Building open translation services for the World” In Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT), 2020
  20. “Label Studio: Data labeling software” Open source software available from https://github.com/heartexlabs/label-studio, 2022 URL: https://github.com/heartexlabs/label-studio
  21. “Attention Is All You Need”, 2023 arXiv:1706.03762 [cs.CL]
  22. “Learning with Local and Global Consistency” In Advances in Neural Information Processing Systems 16 MIT Press, 2003
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sai Akarsh (2 papers)
  2. Vamshi Raghusimha (1 paper)
  3. Anindita Mondal (9 papers)
  4. Anil Vuppala (4 papers)