Papers
Topics
Authors
Recent
Search
2000 character limit reached

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

Published 24 Nov 2023 in cs.SD, cs.CL, and eess.AS | (2311.14836v3)

Abstract: This paper proposes two innovative methodologies to construct customized Common Voice datasets for low-resource languages like Hindi. The first methodology leverages Bark, a transformer-based text-to-audio model developed by Suno, and incorporates Meta's enCodec and a pre-trained HuBert model to enhance Bark's performance. The second methodology employs Retrieval-Based Voice Conversion (RVC) and uses the Ozen toolkit for data preparation. Both methodologies contribute to the advancement of ASR technology and offer valuable insights into addressing the challenges of constructing customized Common Voice datasets for under-resourced languages. Furthermore, they provide a pathway to achieving high-quality, personalized voice generation for a range of applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Common voice: A massively-multilingual speech corpus, 2020.
  2. https://github.com/suno-ai/bark.
  3. James Betker. Better speech synthesis through scaling, 2023.
  4. End-to-end speaker segmentation for overlap-aware resegmentation, 2021.
  5. Fine tuning and comparing tacotron 2, deep voice 3, and fastspeech 2 tts models in a low resource environment. In 2022 IEEE International Conference on Data Science and Information System (ICDSIS), pages 1–6, 2022.
  6. Low-resource expressive text-to-speech using data augmentation. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6593–6597, 2021.
  7. Tacotron model and cnn in virtual reality for cancer diagnosis and communication between doctors and patients. In 2021 2nd International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), pages 448–453, 2021.
  8. Effect of data reduction on sequence-to-sequence neural tts. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7075–7079, 2019.
  9. Lightspeech: Lightweight and fast text to speech with neural architecture search. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5699–5703, 2021.
  10. Training multi-speaker neural text-to-speech systems using speaker-imbalanced speech corpora. In Interspeech, 2019.
  11. https://github.com/devilismyfriend/ozen-toolkit.
  12. Librispeech: An asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5206–5210, 2015.
  13. Fastspeech 2: Fast and high-quality end-to-end text to speech, 2022.
  14. https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.
  15. Enhancing suno’s bark text-to-speech model: Addressing limitations through meta’s encodec and pre-trained hubert. SSRN, 2023. Available at SSRN: https://ssrn.com/abstract=4443815.
  16. https://github.com/serp-ai/bark-with-voice-clone.
  17. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4779–4783, 2018.
  18. https://github.com/deezer/spleeter.
  19. Tacotron: Towards end-to-end speech synthesis, 2017.
  20. Adrian Łańcucki. Fastpitch: Parallel text-to-speech with pitch prediction. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6588–6592, 2021.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 6 likes about this paper.