Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval (2309.12158v1)
Abstract: A range of applications of multi-modal music information retrieval is centred around the problem of connecting large collections of sheet music (images) to corresponding audio recordings, that is, identifying pairs of audio and score excerpts that refer to the same musical content. One of the typical and most recent approaches to this task employs cross-modal deep learning architectures to learn joint embedding spaces that link the two distinct modalities - audio and sheet music images. While there has been steady improvement on this front over the past years, a number of open problems still prevent large-scale employment of this methodology. In this article we attempt to provide an insightful examination of the current developments on audio-sheet music retrieval via deep learning methods. We first identify a set of main challenges on the road towards robust and large-scale cross-modal music retrieval in real scenarios. We then highlight the steps we have taken so far to address some of these challenges, documenting step-by-step improvement along several dimensions. We conclude by analysing the remaining challenges and present ideas for solving these, in order to pave the way to a unified and robust methodology for cross-modal music retrieval.
- Fast identification of piece and score position via symbolic fingerprinting. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 433–438, Porto, Portugal, 2012.
- Learning soft-attention models for tempo-invariant audio-sheet music retrieval. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 216–222, Delft, Netherlands, 2019.
- Understanding optical music recognition. ACM Computing Surveys, 53(77), 2021.
- Self-supervised contrastive learning for robust audio–sheet music retrieval systems. In Proceedings of the ACM International Conference on Multimedia Systems (ACM-MMSys), Vancouver, Canada, 2023.
- L. Carvalho and G. Widmer. Exploiting temporal dependencies for cross-modal music piece identification. In 29th European Signal Processing Conference (EUSIPCO), pages 386–390, 2021.
- L. Carvalho and G. Widmer. Passage summarization with recurrent models for audio-sheet music retrieval. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Milan, Italy, 2023.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th ICML, 2020.
- Learning audio–sheet music correspondences for cross-modal retrieval and piece identification. Transactions of the International Society for Music Information Retrieval, 1(1), 2018.
- End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss. International Journal of Multimedia Information Retrieval, 7(2):117–128, Jun 2018.
- Sheet music-audio identification. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), pages 645–650, Kobe, Japan, Oct. 2009.
- Enabling factorized piano music modeling and generation with the MAESTRO dataset. In ICLR, 2019.
- Cross-modal music retrieval and applications: An overview of key methodologies. IEEE Signal Processing Magazine, 36(1):52–62, 2019.
- An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5):927–939, 2016.
- T. J. Tsai. Towards linking the Lakh and IMSLP datasets. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
- Luis Carvalho (19 papers)
- Gerhard Widmer (144 papers)