Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 428 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Crowdsourced and Automatic Speech Prominence Estimation (2310.08464v2)

Published 12 Oct 2023 in eess.AS and cs.SD

Abstract: The prominence of a spoken word is the degree to which an average native listener perceives the word as salient or emphasized relative to its context. Speech prominence estimation is the process of assigning a numeric value to the prominence of each word in an utterance. These prominence labels are useful for linguistic analysis, as well as training automated systems to perform emphasis-controlled text-to-speech or emotion recognition. Manually annotating prominence is time-consuming and expensive, which motivates the development of automated methods for speech prominence estimation. However, developing such an automated system using machine-learning methods requires human-annotated training data. Using our system for acquiring such human annotations, we collect and open-source crowdsourced annotations of a portion of the LibriTTS dataset. We use these annotations as ground truth to train a neural speech prominence estimator that generalizes to unseen speakers, datasets, and speaking styles. We investigate design decisions for neural prominence estimation as well as how neural prominence estimation improves as a function of two key factors of annotation cost: dataset size and the number of annotations per utterance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “Sound, structure and meaning: The bases of prominence ratings in English, French and Spanish,” Journal of Phonetics, 2019.
  2. “A crosslinguistic study of prosodic focus,” in International Conference on Acoustics, Speech, and Signal Processing, 2015.
  3. “Emphasis control for parallel neural TTS,” in Interspeech, 2022.
  4. “Prosodic prominence and boundaries in sequence-to-sequence speech synthesis,” in Speech Prosody, May 2020.
  5. “A model for varying speaking style in TTS systems,” in Speech Prosody, 2010.
  6. “Emotion recognition from speech using global and local prosodic features,” International Journal of Speech Technology, 2013.
  7. “Automatic emphatic information extraction from aligned acoustic data and its application on sentence compression,” AAAI Conference on Artificial Intelligence, 2017.
  8. “3PRO – An unsupervised method for the automatic detection of sentence prominence in speech,” Speech Communication, 2016.
  9. “Hierarchical representation and estimation of prosody using continuous wavelet transform,” Computer Speech & Language, 2017.
  10. “Supervised and unsupervised approaches for controlling narrow lexical focus in sequence-to-sequence speech synthesis,” in IEEE Spoken Language Technology Workshop, 2021.
  11. “Controlling prominence realisation in parametric DNN-based speech synthesis,” in Interspeech, 2017.
  12. “Predicting prosodic prominence from text with pre-trained contextualized word representations,” in Nordic Conference on Computational Linguistics, 2019.
  13. “BERT, can HE predict contrastive focus? predicting and controlling prominence in neural TTS using a language model,” in Interspeech, 2022.
  14. “Word prominence detection using robust yet simple prosodic features,” in Interspeech, 2012.
  15. “Automatic labelling of prosodic prominence, phrasing and disfluencies in French speech by simulating the perception of naïve and expert listeners,” in Interspeech, 2017.
  16. “Acoustic and temporal representations in convolutional neural network models of prosodic events,” Speech Communication, 2020.
  17. “Deep learning for prominence detection in children’s read speech,” in International Conference on Acoustics, Speech and Signal Processing, 2022.
  18. “Prosodic event detection in children’s read speech,” Computer Speech & Language, 2021.
  19. “Reproducible subjective evaluation,” in ICLR Workshop on ML Evaluation Standards, 2022.
  20. “Fast and easy crowdsourced perceptual audio evaluation,” in International Conference on Acoustics, Speech and Signal Processing, 2016.
  21. “Crowd-sourcing prosodic annotation,” Computer Speech & Language, 2017.
  22. “LibriTTS: A corpus derived from LibriSpeech for text-to-speech,” in Interspeech, 2019.
  23. “Bots or inattentive humans? Identifying sources of low-quality data in online platforms,” PsyArXiv preprint PsyArXiv:wr8ds, 2021.
  24. “py-irt: A scalable item response theory library for Python,” INFORMS Journal on Computing, 2023.
  25. “Rectifier nonlinearities improve neural network acoustic models,” in International Conference on Machine Learning, 2013.
  26. “Gaussian error linear units,” arXiv preprint arXiv:1606.08415, 2016.
  27. “Sigmoid-weighted linear units for neural network function approximation in reinforcement learning,” Neural networks, 2018.
  28. “Attention is all you need,” in Neural Information Processing Systems, 2017.
  29. “The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability,” Speech Communication, 2005.
  30. “Speaker identification on the SCOTUS corpus,” The Journal of the Acoustical Society of America, 2008.
  31. Max Morrison, “Python forced alignment (version 0.0.3),” https://github.com/maxrmorrison/pyfoal, 2023.
  32. “On batching variable size inputs for training end-to-end speech enhancement systems,” arXiv preprint arXiv:2301.10587, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.