Papers
Topics
Authors
Recent
Search
2000 character limit reached

Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled and Synthetic Data

Published 14 Aug 2023 in cs.SD, cs.LG, and eess.AS | (2308.07170v2)

Abstract: In the domain of music and sound processing, pitch extraction plays a pivotal role. Our research presents a specialized convolutional neural network designed for pitch extraction, particularly from the human singing voice in acapella performances. Notably, our approach combines synthetic data with auto-labeled acapella sung audio, creating a robust training environment. Evaluation across datasets comprising synthetic sounds, opera recordings, and time-stretched vowels demonstrates its efficacy. This work paves the way for enhanced pitch extraction in both music and voice settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. P. Assmann and W. Katz. Synthesis fidelity and time-varying spectral change in vowels. Journal of the Acoustical Society of America, 117:886–895, 2005. URL https://personal.utdallas.edu/~assmann/KIDVOW1/North_Texas_vowel_database.html.
  2. Paul Boersma. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17, IFA Proceedings 17, pages 97–110, 1993.
  3. Mf-pam: Accurate pitch estimation through periodicity analysis and multi-level feature fusion, 2023.
  4. Mt3: Multi-task multitrack music transcription, 2022.
  5. Deep residual learning for image recognition, 2015.
  6. Automatic identification of emotional cues in chinese opera singing. In Proc. of 13th Int. Conf. on Music Perception and Cognition and the 5th Conference for the Asian-Pacific Society for Cognitive Sciences of Music (ICMPC 13-APSCOM 5), Seoul, South Korea, August 2014.
  7. Multitrack music transcription with a time-frequency perceiver, 2023.
  8. Wavenet: A generative model for raw audio, 2016.
  9. You only hear once: A YOLO-like algorithm for audio segmentation and sound event detection. Applied Sciences, 12(7):3293, mar 2022. doi: 10.3390/app12073293. URL https://doi.org/10.3390%2Fapp12073293.
  10. A phoneme-informed neural network model for note-level singing transcription, 2023.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.