2000 character limit reached
Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled and Synthetic Data (2308.07170v2)
Published 14 Aug 2023 in cs.SD, cs.LG, and eess.AS
Abstract: In the domain of music and sound processing, pitch extraction plays a pivotal role. Our research presents a specialized convolutional neural network designed for pitch extraction, particularly from the human singing voice in acapella performances. Notably, our approach combines synthetic data with auto-labeled acapella sung audio, creating a robust training environment. Evaluation across datasets comprising synthetic sounds, opera recordings, and time-stretched vowels demonstrates its efficacy. This work paves the way for enhanced pitch extraction in both music and voice settings.
- P. Assmann and W. Katz. Synthesis fidelity and time-varying spectral change in vowels. Journal of the Acoustical Society of America, 117:886–895, 2005. URL https://personal.utdallas.edu/~assmann/KIDVOW1/North_Texas_vowel_database.html.
- Paul Boersma. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17, IFA Proceedings 17, pages 97–110, 1993.
- Mf-pam: Accurate pitch estimation through periodicity analysis and multi-level feature fusion, 2023.
- Mt3: Multi-task multitrack music transcription, 2022.
- Deep residual learning for image recognition, 2015.
- Automatic identification of emotional cues in chinese opera singing. In Proc. of 13th Int. Conf. on Music Perception and Cognition and the 5th Conference for the Asian-Pacific Society for Cognitive Sciences of Music (ICMPC 13-APSCOM 5), Seoul, South Korea, August 2014.
- Multitrack music transcription with a time-frequency perceiver, 2023.
- Wavenet: A generative model for raw audio, 2016.
- You only hear once: A YOLO-like algorithm for audio segmentation and sound event detection. Applied Sciences, 12(7):3293, mar 2022. doi: 10.3390/app12073293. URL https://doi.org/10.3390%2Fapp12073293.
- A phoneme-informed neural network model for note-level singing transcription, 2023.