Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music (2405.12105v3)

Published 20 May 2024 in cs.CV

Abstract: Optical Music Recognition (OMR) has made significant progress since its inception, with various approaches now capable of accurately transcribing music scores into digital formats. Despite these advancements, most so-called \emph{end-to-end} OMR approaches still rely on multi-stage processing pipelines for transcribing full-page score images, which introduces several limitations that hinder the full potential of the field. In this paper, we present the first truly end-to-end approach for page-level OMR. Our system, which combines convolutional layers with autoregressive Transformers, processes an entire music score page and outputs a complete transcription in a music encoding format. This is made possible by both the architecture and the training procedure, which utilizes curriculum learning through incremental synthetic data generation. We evaluate the proposed system using pianoform corpora. This evaluation is conducted first in a controlled scenario with synthetic data, and subsequently against two real-world corpora of varying conditions. Our approach is compared with leading commercial OMR software. The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain, representing a significant contribution to the field of OMR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. \bibcommenthead
  2. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleDecoupling music notation to improve end-to-end Optical Music Recognition Decoupling music notation to improve end-to-end optical music recognition.\BBCQ \APACjournalVolNumPagesPattern Recognition Letters158157–163, \PrintBackRefs\CurrentBib
  3. \APACrefYearMonthDay2023May26. \BBOQ\APACrefatitleOptical music recognition for homophonic scores with neural networks and synthetic music generation Optical music recognition for homophonic scores with neural networks and synthetic music generation.\BBCQ \APACjournalVolNumPagesInternational Journal of Multimedia Information Retrieval12112, \PrintBackRefs\CurrentBib
  4. \APACrefYearMonthDay2021\APACmonth11. \BBOQ\APACrefatitleOMR-assisted transcription: a case study with early prints OMR-assisted transcription: a case study with early prints.\BBCQ \APACrefbtitleProceedings of the 22nd International Society for Music Information Retrieval Conference Proceedings of the 22nd International Society for Music Information Retrieval Conference (\BPGS 35–41). \APACaddressPublisherOnlineISMIR. \PrintBackRefs\CurrentBib
  5. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleMusigraph: Optical Music Recognition Through Object Detection and Graph Neural Network Musigraph: Optical music recognition through object detection and graph neural network.\BBCQ \APACrefbtitleInternational Conference on Frontiers in Handwriting Recognition International conference on frontiers in handwriting recognition (\BPGS 171–184). \PrintBackRefs\CurrentBib
  6. \APACrefYearMonthDay2023. \APACrefbtitleNougat: Neural Optical Understanding for Academic Documents. Nougat: Neural optical understanding for academic documents. \PrintBackRefs\CurrentBib
  7. \APACinsertmetastarBluche:NIPS:2016{APACrefauthors}Bluche, T.  \APACrefYearMonthDay2016. \BBOQ\APACrefatitleJoint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition Joint line segmentation and transcription for end-to-end handwritten paragraph recognition.\BBCQ \APACrefbtitleAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, december 5-10, 2016, barcelona, spain (\BPGS 838–846). \PrintBackRefs\CurrentBib
  8. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleScan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention Scan, attend and read: End-to-end handwritten paragraph recognition with MDLSTM attention.\BBCQ \APACrefbtitle14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 9-15, 2017 14th IAPR international conference on document analysis and recognition, ICDAR 2017, kyoto, japan, november 9-15, 2017 (\BPGS 1050–1055). \APACaddressPublisherIEEE. \PrintBackRefs\CurrentBib
  9. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleLanguage Models are Few-Shot Learners Language models are few-shot learners.\BBCQ H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan\BCBL \BBA H. Lin (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 33, \BPGS 1877–1901). \APACaddressPublisherCurran Associates, Inc. \PrintBackRefs\CurrentBib
  10. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleUnderstanding Optical Music Recognition Understanding optical music recognition.\BBCQ \APACjournalVolNumPagesACM Comput. Surv.534, \PrintBackRefs\CurrentBib
  11. \APACrefYearMonthDay2018\APACmonth11. \BBOQ\APACrefatitleCamera-PrIMuS: Neural End-to-End Optical Music Recognition on Realistic Monophonic Scores Camera-PrIMuS: Neural End-to-End Optical Music Recognition on Realistic Monophonic Scores.\BBCQ \APACrefbtitleProceedings of the 19th International Society for Music Information Retrieval Conference Proceedings of the 19th International Society for Music Information Retrieval Conference (\BPG 248-255). \APACaddressPublisherISMIR. \PrintBackRefs\CurrentBib
  12. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleHandwritten Music Recognition for Mensural notation with convolutional recurrent neural networks Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks.\BBCQ \APACjournalVolNumPagesPattern Recognition Letters128115–121, \PrintBackRefs\CurrentBib
  13. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleSheet music statistical layout analysis Sheet music statistical layout analysis.\BBCQ \APACrefbtitle2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR) 2016 15th international conference on frontiers in handwriting recognition (icfhr) (\BPGS 313–318). \PrintBackRefs\CurrentBib
  14. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleA neural approach for full-page optical music recognition of mensural documents A neural approach for full-page optical music recognition of mensural documents.\BBCQ \APACrefbtitleProc. of the 21th Int. Society for Music Information Retrieval Conference Proc. of the 21th int. society for music information retrieval conference (\BPGS 12–16). \PrintBackRefs\CurrentBib
  15. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleState-of-the-Art Speech Recognition with Sequence-to-Sequence Models State-of-the-art speech recognition with sequence-to-sequence models.\BBCQ \APACrefbtitle2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018 ieee international conference on acoustics, speech and signal processing (icassp) (\BPGS 4774–4778). \PrintBackRefs\CurrentBib
  16. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleAn Efficient End-to-End Neural Model for Handwritten Text Recognition An efficient end-to-end neural model for handwritten text recognition.\BBCQ \APACrefbtitleBritish Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018 British machine vision conference 2018, BMVC 2018, newcastle, uk, september 3-6, 2018 (\BPG 202). \APACaddressPublisherBMVA Press. \PrintBackRefs\CurrentBib
  17. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleDAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition Dan: a segmentation-free document attention network for handwritten document recognition.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Pattern Analysis and Machine Intelligence4578227-8243, \PrintBackRefs\CurrentBib
  18. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT: pre-training of deep bidirectional transformers for language understanding.\BBCQ J. Burstein, C. Doran\BCBL \BBA T. Solorio (\BEDS), \APACrefbtitleProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers) Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, NAACL-HLT 2019, minneapolis, mn, usa, june 2-7, 2019, volume 1 (long and short papers) (\BPGS 4171–4186). \APACaddressPublisherAssociation for Computational Linguistics. \PrintBackRefs\CurrentBib
  19. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleMSdocTr-Lite: A lite transformer for full page multi-script handwriting recognition Msdoctr-lite: A lite transformer for full page multi-script handwriting recognition.\BBCQ \APACjournalVolNumPagesPattern Recognition Letters16928-34, \PrintBackRefs\CurrentBib
  20. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleA diff procedure for music score files A diff procedure for music score files.\BBCQ \APACrefbtitle6th International Conference on Digital Libraries for Musicology 6th international conference on digital libraries for musicology (\BPGS 58–64). \PrintBackRefs\CurrentBib
  21. \APACinsertmetastarGood:XML:2001{APACrefauthors}Good, M.\BCBT \BOthersPeriod.   \APACrefYearMonthDay2001. \BBOQ\APACrefatitleMusicXML: An internet-friendly format for sheet music Musicxml: An internet-friendly format for sheet music.\BBCQ \APACrefbtitleXml conference and expo Xml conference and expo (\BPGS 03–04). \PrintBackRefs\CurrentBib
  22. \APACrefYearMonthDay2006. \BBOQ\APACrefatitleConnectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.\BBCQ \APACrefbtitleProceedings of the Twenty-Third International Conference on Machine Learning, (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006 Proceedings of the twenty-third international conference on machine learning, (ICML 2006), pittsburgh, pennsylvania, usa, june 25-29, 2006 (\BPGS 369–376). \PrintBackRefs\CurrentBib
  23. \APACrefYearMonthDay2011. \BBOQ\APACrefatitleThe Music Encoding Initiative as a Document-Encoding Framework The music encoding initiative as a document-encoding framework.\BBCQ A. Klapuri \BBA C. Leider (\BEDS), \APACrefbtitleProceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011 Proceedings of the 12th international society for music information retrieval conference, ISMIR 2011, miami, florida, usa, october 24-28, 2011 (\BPGS 293–298). \APACaddressPublisherUniversity of Miami. \PrintBackRefs\CurrentBib
  24. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleMusicaiz: A python library for symbolic music generation, analysis and visualization Musicaiz: A python library for symbolic music generation, analysis and visualization.\BBCQ \APACjournalVolNumPagesSoftwareX22101365, \PrintBackRefs\CurrentBib
  25. \APACrefYearMonthDay2018. \APACrefbtitleMusic Transformer. Music transformer. \PrintBackRefs\CurrentBib
  26. \APACinsertmetastarHuron:BMIDI:1997{APACrefauthors}Huron, D.  \APACrefYearMonthDay1997. \BBOQ\APACrefatitleHumdrum and Kern: Selective feature encoding Humdrum and kern: Selective feature encoding.\BBCQ \APACjournalVolNumPagesBeyond MIDI, \PrintBackRefs\CurrentBib
  27. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleThe MUSCIMA++ Dataset for Handwritten Optical Music Recognition The MUSCIMA++ Dataset for Handwritten Optical Music Recognition.\BBCQ \APACrefbtitle14th International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 13 - 15, 2017 14th international conference on document analysis and recognition, ICDAR 2017, kyoto, japan, november 13 - 15, 2017 (\BPGS 39–46). \APACaddressPublisherNew York, USAIEEE Computer Society. \PrintBackRefs\CurrentBib
  28. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleOCR-Free Document Understanding Transformer Ocr-free document understanding transformer.\BBCQ \APACrefbtitleEuropean Conference on Computer Vision (ECCV). European conference on computer vision (eccv). \PrintBackRefs\CurrentBib
  29. \APACrefYearMonthDay2021. \APACrefbtitleTrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. Trocr: Transformer-based optical character recognition with pre-trained models. \PrintBackRefs\CurrentBib
  30. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleSwin Transformer: Hierarchical Vision Transformer using Shifted Windows Swin transformer: Hierarchical vision transformer using shifted windows.\BBCQ \APACrefbtitleProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Proceedings of the ieee/cvf international conference on computer vision (iccv). \PrintBackRefs\CurrentBib
  31. \APACrefYearMonthDay2022June. \BBOQ\APACrefatitleA ConvNet for the 2020s A convnet for the 2020s.\BBCQ \APACrefbtitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Proceedings of the ieee/cvf conference on computer vision and pattern recognition (cvpr) (\BPGS 11976–11986). \PrintBackRefs\CurrentBib
  32. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA Holistic Approach for Aligned Music and Lyrics Transcription A holistic approach for aligned music and lyrics transcription.\BBCQ \APACrefbtitleDocument Analysis and Recognition - ICDAR 2023 - 17th International Conference, San José, CA, USA, August 21-26, 2023, Proceedings, Part I Document analysis and recognition - ICDAR 2023 - 17th international conference, san josé, ca, usa, august 21-26, 2023, proceedings, part I (\BVOL 14187, \BPGS 185–201). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
  33. \APACrefYearMonthDay2018\APACmonth11. \BBOQ\APACrefatitleOptical Music Recognition in Mensural Notation with Region-based Convolutional Neural Networks Optical Music Recognition in Mensural Notation with Region-based Convolutional Neural Networks.\BBCQ \APACrefbtitleProceedings of the 19th International Society for Music Information Retrieval Conference Proceedings of the 19th International Society for Music Information Retrieval Conference (\BPG 240-247). \APACaddressPublisherISMIR. \PrintBackRefs\CurrentBib
  34. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleLearning Notation Graph Construction for Full-Pipeline Optical Music Recognition Learning notation graph construction for full-pipeline optical music recognition.\BBCQ \APACrefbtitle20th International Society for Music Information Retrieval Conference 20th international society for music information retrieval conference (\BPGS 75–82). \PrintBackRefs\CurrentBib
  35. \APACrefYearMonthDay2014jan. \BBOQ\APACrefatitleVerovio - A library for Engraving MEI Music Notation into SVG. Verovio - A library for Engraving MEI Music Notation into SVG.\BBCQ \APACrefbtitleInternational Society for Music Information Retrieval. International society for music information retrieval. \PrintBackRefs\CurrentBib
  36. \APACrefYearMonthDay2012. \BBOQ\APACrefatitleOptical music recognition: state-of-the-art and open issues Optical music recognition: state-of-the-art and open issues.\BBCQ \APACjournalVolNumPagesInternational Journal of Multimedia Information Retrieval13173–190, \PrintBackRefs\CurrentBib
  37. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleEvaluating Simultaneous Recognition and Encoding for Optical Music Recognition Evaluating simultaneous recognition and encoding for optical music recognition.\BBCQ \APACrefbtitleDLfM ’20: 7th International Conference on Digital Libraries for Musicology, Montréal, QC, Canada, October 16, 2020 Dlfm ’20: 7th international conference on digital libraries for musicology, montréal, qc, canada, october 16, 2020 (\BPGS 10–17). \APACaddressPublisherACM. \PrintBackRefs\CurrentBib
  38. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleOn the Use of Transformers for End-to-End Optical Music Recognition On the use of transformers for end-to-end optical music recognition.\BBCQ \APACrefbtitlePattern Recognition and Image Analysis - 10th Iberian Conference, IbPRIA 2022, Aveiro, Portugal, May 4-6, 2022, Proceedings Pattern recognition and image analysis - 10th iberian conference, ibpria 2022, aveiro, portugal, may 4-6, 2022, proceedings (\BVOL 13256, \BPGS 470–481). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
  39. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleComplete Optical Music Recognition via Agnostic Transcription and Machine Translation Complete optical music recognition via agnostic transcription and machine translation.\BBCQ \APACrefbtitle16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part III 16th international conference on document analysis and recognition, ICDAR 2021, lausanne, switzerland, september 5-10, 2021, proceedings, part III (\BVOL 12823, \BPGS 661–675). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
  40. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleEnd-to-end optical music recognition for pianoform sheet music End-to-end optical music recognition for pianoform sheet music.\BBCQ \APACjournalVolNumPagesInt. J. Document Anal. Recognit.263347–362, \PrintBackRefs\CurrentBib
  41. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleSheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription Sheet music transformer: End-to-end optical music recognition beyond monophonic transcription.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2402.07596, \PrintBackRefs\CurrentBib
  42. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleApplying Automatic Translation for Optical Music Recognition’s Encoding Step Applying automatic translation for optical music recognition’s encoding step.\BBCQ \APACjournalVolNumPagesApplied Sciences, Special Issue: Advances in Music Reading Systems119, \PrintBackRefs\CurrentBib
  43. \APACrefYearMonthDay2022\APACmonth12. \BBOQ\APACrefatitleEnd-to-End Full-Page Optical Music Recognition for Mensural Notation End-to-End Full-Page Optical Music Recognition for Mensural Notation.\BBCQ \APACrefbtitleProceedings of the 23rd International Society for Music Information Retrieval Conference Proceedings of the 23rd International Society for Music Information Retrieval Conference (\BPG 226-232). \APACaddressPublisherBengaluru, IndiaISMIR. \PrintBackRefs\CurrentBib
  44. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleAn End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Pattern Analysis and Machine Intelligence39112298–2304, \PrintBackRefs\CurrentBib
  45. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleFull Page Handwriting Recognition via Image to Sequence Extraction Full page handwriting recognition via image to sequence extraction.\BBCQ J. Lladós, D. Lopresti\BCBL \BBA S. Uchida (\BEDS), \APACrefbtitle16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part III 16th international conference on document analysis and recognition, ICDAR 2021, lausanne, switzerland, september 5-10, 2021, proceedings, part III (\BVOL 12823, \BPGS 55–69). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
  46. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleOn the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition.\BBCQ \APACrefbtitleProceedings of the 22nd International Society for Music Information Retrieval Conference Proceedings of the 22nd International Society for Music Information Retrieval Conference (\BPGS 690–696). \APACaddressPublisherISMIR. \PrintBackRefs\CurrentBib
  47. \APACrefYearMonthDay2023. \APACrefbtitleThe Common Optical Music Recognition Evaluation Framework. The common optical music recognition evaluation framework. \PrintBackRefs\CurrentBib
  48. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleReal world music object recognition Real world music object recognition.\BBCQ \PrintBackRefs\CurrentBib
  49. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleAttention is All you Need Attention is all you need.\BBCQ I. Guyon \BOthers. (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 30). \APACaddressPublisherCurran Associates, Inc. \PrintBackRefs\CurrentBib
  50. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleWatch, Attend and Parse: An End-to-end Neural Network Based Approach to Handwritten Mathematical Expression Recognition Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition.\BBCQ \APACjournalVolNumPagesPattern Recognition71196–206, \PrintBackRefs\CurrentBib
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Antonio Ríos-Vila (6 papers)
  2. Jorge Calvo-Zaragoza (26 papers)
  3. David Rizo (3 papers)
  4. Thierry Paquet (23 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com