Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled and Synthetic Data (2308.07170v2)

Published 14 Aug 2023 in cs.SD, cs.LG, and eess.AS

Abstract: In the domain of music and sound processing, pitch extraction plays a pivotal role. Our research presents a specialized convolutional neural network designed for pitch extraction, particularly from the human singing voice in acapella performances. Notably, our approach combines synthetic data with auto-labeled acapella sung audio, creating a robust training environment. Evaluation across datasets comprising synthetic sounds, opera recordings, and time-stretched vowels demonstrates its efficacy. This work paves the way for enhanced pitch extraction in both music and voice settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. P. Assmann and W. Katz. Synthesis fidelity and time-varying spectral change in vowels. Journal of the Acoustical Society of America, 117:886–895, 2005. URL https://personal.utdallas.edu/~assmann/KIDVOW1/North_Texas_vowel_database.html.
  2. Paul Boersma. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17, IFA Proceedings 17, pages 97–110, 1993.
  3. Mf-pam: Accurate pitch estimation through periodicity analysis and multi-level feature fusion, 2023.
  4. Mt3: Multi-task multitrack music transcription, 2022.
  5. Deep residual learning for image recognition, 2015.
  6. Automatic identification of emotional cues in chinese opera singing. In Proc. of 13th Int. Conf. on Music Perception and Cognition and the 5th Conference for the Asian-Pacific Society for Cognitive Sciences of Music (ICMPC 13-APSCOM 5), Seoul, South Korea, August 2014.
  7. Multitrack music transcription with a time-frequency perceiver, 2023.
  8. Wavenet: A generative model for raw audio, 2016.
  9. You only hear once: A YOLO-like algorithm for audio segmentation and sound event detection. Applied Sciences, 12(7):3293, mar 2022. doi: 10.3390/app12073293. URL https://doi.org/10.3390%2Fapp12073293.
  10. A phoneme-informed neural network model for note-level singing transcription, 2023.

Summary

We haven't generated a summary for this paper yet.