Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer (2312.17156v3)

Published 28 Dec 2023 in cs.SD and eess.AS

Abstract: Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scenarios, BEAST applies contextual block processing in the Transformer encoder. Moreover, we adopt relative positional encoding in the attention layer of the streaming Transformer encoder to capture relative timing position which is critically important information in music. Carrying out beat and downbeat experiments on benchmark datasets for a low latency scenario with maximum latency under 50 ms, BEAST achieves an F1-measure of 80.04% in beat and 46.78% in downbeat, which is a substantial improvement of about 5 percentage points over the state-of-the-art online beat tracking model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. “Source separation-based data augmentation for improved joint beat and downbeat tracking,” in EUSIPCO, 2021, pp. 391–395.
  2. “A music structure informed downbeat tracing system using skip-chain conditional random fields and deep learning,” in ICASSP, 2019, pp. 481–485.
  3. T.-P. Chen and L. Su, “Toward postprocessing-free neural networks for joint beat and downbeat estimation,” in ISMIR, 2022.
  4. “A multi-model approach to beat tracking considering heterogeneous music styles,” in ISMIR, 2014, pp. 603–608.
  5. S. Böck and M. E. P. Davies, “Deconstruct, analyse, reconstruct: How to improve tempo, beat, and downbeat estimation,” in ISMIR, 2020, pp. 574–582.
  6. “Beat Transormer: Demixed beat and downbeat tracking with dilated self-attention,” in ISMIR, 2022.
  7. “Modeling beats and downbeats with a time-frequency transformer,” in ICASSP, 2022, pp. 401–405.
  8. “Attention is all you need,” in NeurIPS, 2017, pp. 5998–6008.
  9. P. M. Brossier, Automatic annotation of musical audio for interactive applications, Ph.D. thesis, Queen Marry University, London, UK, 2006.
  10. “Real-time dance generation to music for a legged robot,” in IROS, 2018, pp. 1038–1044.
  11. “IBT: A real-time tempo and beat tracking system,” in ISMIR, 2014, pp. 291–296.
  12. “BeatNet: CRNN and particle filtering for online joint beat downbeat and meter tracing,” in ISMIR, 2021.
  13. “A Novel 1D State Space for Efficient Music Rhythmic Analysis,” in ICASSP, 2022.
  14. “Transformer asr with contextual block processing,” in ASRU, 2019, pp. 427–433.
  15. “Transformer-XL: Attentive language models beyond a fixed-length context,” in ACL, 2019, pp. 2978–2988.
  16. “Music transformer,” in ICLR, 2018.
  17. “Self-attention aligner: A latency-control end-to-end model for asr using self-attention network and chunk-hopping,” in ICASSP, 2019, pp. 5656–5660.
  18. “Streaming transformer-based acoustic models using self-attention with augmented memory,” in Interspeech, 2020, pp. 2132–2136.
  19. “Madmom: A new python audio and music signal processing library,” in ACM MM, 2016, pp. 1174–1178.
  20. “ROBOD: A real-time online beat and offbeat drummer sebastian ,” in IEEE signal processing cup, 2017.
  21. “Evaluation methods for musical audio beat tracking algorithms,” Tech. Rep., Centre for Digital Music, Queen Mary University of Londo, 2009.
  22. “Rhythmic pattern modeling for beat and downbeat tracking in musical audio,” in ISMIR, 2013, pp. 227–232.
  23. “Particle filtering applied to musical tempo tracking,” in EURASIP JASP, 2004, vol. 2004, pp. 1–11.
  24. A. Srinivasamurthy and X. Serra, “A supervised approach to hierarchical metrical cycle tracking from audio music recordings,” in ICASSP, 2014, p. 5217–5221.
  25. “Selective sampling for beat tracking evaluation,” in IEEE TASLP, 2012, vol. 20, p. 2539–2548.
  26. U. Marchand and G. Peeters, “Swing ratio estimation,” in DAFx, 2015.
  27. “Self-attention with relative position representations,” arXiv preprint arXiv:1803.02155, 2018.
Citations (2)

Summary

  • The paper presents BEAST, which leverages streaming Transformers and contextual block processing for real-time beat and downbeat tracking.
  • It employs relative positional encoding to achieve an F1-measure of 83.65% for beat tracking at 46 ms latency, outperforming CRNN-based models.
  • The approach advances real-time music information retrieval, offering practical benefits for digital audio workstations and virtual accompaniment systems.

An Evaluation of BEAST: Online Beat and Downbeat Tracking with Streaming Transformers

The paper "BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer," authored by Chih-Cheng Chang and Li Su from the Institute of Information Science, Academia Sinica, addresses critical challenges in online music beat tracking. Through the development of the BEAST framework, the paper contributes a novel approach leveraging a streaming Transformer model, specifically tailored for online joint beat and downbeat tracking with an emphasis on low latency requirements.

Key Contributions and Methodology

BEAST adopts a Transformer-based model that utilizes contextual block processing to facilitate online capabilities. Traditionally, Transformer architectures require the complete input sequence to calculate attention scores, presenting significant challenges for real-time applications. BEAST circumvents this limitation by segmenting the input sequence into non-overlapping blocks, each coupled with additional context frames for left and right sub-blocks. This strategy maintains the critical temporal context necessary for accurate beat and downbeat predictions while supporting incremental processing, enhancing suitability for online operations.

The model further incorporates a relative positional encoding mechanism instead of the more conventional absolute positional encoding. This improvement ensures the model captures the pairwise relationships between musical elements, which are vital in understanding rhythmic structures. Experimental evaluations demonstrated superior performance with the relative positional encoding, achieving a notable F1-measure of 83.65%, indicating its efficacy over previous methodologies relying on absolute positional encoding.

Numerical Results and Performance

Experimental results reveal that BEAST achieves an F1-measure of 80.04% for beat tracking with a latency of 46 ms, setting a new benchmark in beat tracking accuracy. The system also demonstrates a substantial enhancement in downbeat tracking performance, marked by an F1-measure of 46.78%. These results signify an approximate 5% improvement compared to leading alternatives, such as the CRNN-based state-of-the-art models. In particular, BEAST addresses a long-standing challenge in online beat tracking by balancing performance with latency considerations, achieving lower real-time factors (RTFs) without compromising performance integrity.

Implications and Speculation on Future Directions

The development of BEAST delineates significant advancements for real-time music information retrieval, particularly in applications where immediate responsiveness is crucial, such as digital audio workstations and virtual accompaniment systems. BEAST's dual focus on latency and accuracy positions it as a potential foundation for extensions into broader MIR domains, such as online transcription and automated music generation.

Looking forward, the successful adaptation of streaming Transformer architectures to MIR tasks signals a promising pathway for future research endeavors in real-time audio processing. An exploration into integrating BEAST with generative models could pave the way for sophisticated real-time accompaniment and compositional systems. Moreover, these findings endorse further investigations into scaling BEAST's architecture for handling larger, more complex musical datasets, thus potentially yielding insights into more generalizable rhythmic models.

In conclusion, BEAST exemplifies a significant contribution to the field of music beat tracking, addressing long-standing challenges associated with real-time processing. The model’s innovative use of streaming Transformers promises to inform future advancements and applications within music information retrieval, setting a precedent for subsequent research in this domain.