Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tempo vs. Pitch: understanding self-supervised tempo estimation (2304.06868v1)

Published 14 Apr 2023 in cs.SD, cs.LG, and eess.AS

Abstract: Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language processing, environmental sound analysis, and recently in music information retrieval, e.g. for pitch estimation. Particularly in the context of music, there are few insights about the fragility of these models regarding different distributions of data, and how they could be mitigated. In this paper, we explore these questions by dissecting a self-supervised model for pitch estimation adapted for tempo estimation via rigorous experimentation with synthetic data. Specifically, we study the relationship between the input representation and data distribution for self-supervised tempo estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “Music tempo estimation: Are we done yet?,” in Transactions of the International Society for Music Information Retrieval, August 2020, vol. 3, pp. 111–125.
  2. “A single-step approach to musical tempo estimation using a convolutional neural network.,” in International Society for Music Information Retrieval Conference (ISMIR), 2018, pp. 98–105.
  3. “Deep-rhythm for tempo estimation and rhythm pattern recognition,” in International Society for Music Information Retrieval Conference (ISMIR), 2019, pp. 636–643.
  4. Sebastian Böck and Matthew E P Davies, “Deconstruct, analyse, reconstruct: How to improve tempo, beat, and downbeat estimation.,” in International Society for Music Information Retrieval Conference (ISMIR), 2020, pp. 574–582.
  5. Geoffroy Peeters, “Template-based estimation of time-varying tempo,” EURASIP Journal on Advances in Signal Processing, vol. 2007, pp. 1–14, 2006.
  6. “Comparative evaluation and combination of audio tempo estimation approaches,” Journal of the Audio Engineering Society, pp. 198–207, 2011.
  7. “SPICE: Self-supervised pitch estimation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1118–1128, 2020.
  8. “Zero-Note Samba: Self-Supervised Beat Tracking,” working paper or preprint, May 2022.
  9. “Exploring modality-agnostic representations for music classification,” in 18th Sound and Music Computing Conference, SMC 2021, 2021, pp. 191–198.
  10. Meinard Müller, Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications, Springer Publishing Company, Incorporated, 1st edition, 2015.
  11. Elio Quinton, “Equivariant self-supervision for musical tempo estimation,” International Society for Music Information Retrieval Conference (ISMIR), 2022.
  12. “FMP notebooks: Autocorrelation tempogram,” https://www.audiolabs-erlangen.de/resources/MIR/FMP/C6/C6S2_TempogramAutocorrelation.html, Accessed: 2022-10-25.
  13. “FMP notebooks: Fourier tempogram,” https://www.audiolabs-erlangen.de/resources/MIR/FMP/C6/C6S2_TempogramFourier.html, Accessed: 2022-10-25.
  14. “Cyclic tempogram—a mid-level tempo representation for musicsignals,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2010, pp. 5522–5525.
  15. “The cyclic beat spectrum: Tempo-related audio features for time-scale invariant audio identification.,” in International Society for Music Information Retrieval Conference (ISMIR), 2006, pp. 35–40.
  16. “Absolute memory for tempo in musicians and non-musicians,” PloS one, vol. 11, no. 10, pp. e0163558, 2016.
  17. G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293–302, 2002.
  18. “GTZAN-Rhythm: Extending the GTZAN Test-Set with Beat, Downbeat and Swing Annotations,” Oct. 2015, Late-Breaking Demo Session of the 16th International Society for Music Information Retrieval Conference, 2015.
  19. François Chollet et al., “Keras,” https://keras.io, 2015.
  20. “Adam: A method for stochastic optimization,” International Conference on Learning Representations (ICLR), 2015.
  21. “librosa: Audio and music signal analysis in python,” 2015, p. 18–24.
  22. “libfmp: A python package for fundamentals of music processing,” Journal of Open Source Software, vol. 6, no. 63, pp. 3326, Jul 2021.
  23. Pauli Virtanen et al., “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.
Citations (3)

Summary

We haven't generated a summary for this paper yet.