Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval (2306.01304v2)

Published 2 Jun 2023 in cs.SD, cs.IR, cs.MM, and eess.AS

Abstract: Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multipitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-ofthe-art methods by up to 10.6%, 8.3% and 10.3% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study shows the effectiveness of each component of JEPOO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Automatic music transcription: An overview. IEEE Signal Processing Magazine, pages 20–30, 2018.
  2. Multitask learning for fundamental frequency estimation in music. arXiv preprint arXiv:1809.00381, 2018.
  3. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proceedings of the International conference on machine learning (ICML), pages 794–803, 2018.
  4. Transfer learning for music classification and regression tasks. arXiv preprint arXiv:1703.09179, 2017.
  5. Yin, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America (JASA), pages 1917–1930, 2002.
  6. Real-time digital hardware pitch detector. IEEE Transactions on Acoustics, Speech, and Signal Processing (TASLP), pages 2–8, 1976.
  7. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing (TASLP), pages 1643–1654, 2009.
  8. Mt3: Multi-task multitrack music transcription. arXiv preprint arXiv:2111.03017, 2021.
  9. Spice: Self-supervised pitch estimation. IEEE Transactions on Audio, Speech, and Language Processing (TASLP), pages 1118–1128, 2020.
  10. Dynamic task prioritization for multitask learning. In Proceedings of the European Conference on Computer Vision (ECCV), pages 270–287, 2018.
  11. Onsets and frames: Dual-objective piano transcription. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2018.
  12. Enabling factorized piano music modeling and generation with the maestro dataset. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
  13. Sequence-to-sequence piano transcription with transformers. arXiv preprint arXiv:2107.09142, 2021.
  14. Deep polyphonic adsr piano note transcription. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 246–250, 2019.
  15. Adversarial learning for improved onsets and frames music transcription. arXiv preprint arXiv:1906.08512, 2019.
  16. Crepe: A convolutional representation for pitch estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 161–165, 2018.
  17. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  18. Decoupling magnitude and phase estimation with deep resunet for music source separation. arXiv preprint arXiv:2109.05418, 2021.
  19. Polyphonic piano transcription using autoregressive multi-state note model. arXiv preprint arXiv:2010.01104, 2020.
  20. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (ICCV), pages 2980–2988, 2017.
  21. Pareto multi-task learning. Advances in neural information processing systems (NIPS), 2019.
  22. End-to-end multi-task learning with attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 1871–1880, 2019.
  23. Transaction time indexing with version compression. Proc. VLDB Endow., page 870–881, 2008.
  24. Spectnt: a time-frequency transformer for music audio. arXiv preprint arXiv:2110.09127, 2021.
  25. pyin: A fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 659–663, 2014.
  26. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, pages 18–25, 2015.
  27. A Michael Noll. Cepstrum pitch determination. The journal of the Acoustical Society of America, pages 293–309, 1967.
  28. Mir_eval: A transparent implementation of common mir metrics. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 367–372, 2014.
  29. An analysis/synthesis framework for automatic f0 annotation of multitrack datasets. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2017.
  30. Detecting beneficial feature interactions for recommender systems. In Proceedings of the AAAI conference on artificial intelligence, pages 4357–4365, 2021.
  31. On the preparation and validation of a large-scale dataset of singing transcription. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 276–280, 2021.
  32. Combating selection biases in recommender systems with a few unbiased ratings. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 427–435, 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com