- The paper evaluates three GAN architectures (DCGAN, ProGAN, CycleWGAN) for their effectiveness in synthesising handwritten music sheets to address data scarcity for Optical Music Recognition (OMR).
- Among the evaluated models, CycleWGAN demonstrated superior performance in generating realistic handwritten music images, achieving better FID and KID scores than DCGAN and ProGAN.
- This research provides valuable insights for improving OMR systems by enabling synthetic data generation and highlights areas for future work, such as integrating diffusion models or refining symbol transfer.
Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN
The paper "Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN" addresses a significant gap in the domain of Optical Music Recognition (OMR). The authors tackle the challenge of generating synthetic handwritten music sheets, essential for the training and improvement of OMR systems. Given the scarcity of annotated datasets for handwriting recognition in music, this research evaluates the effectiveness of three distinct Generative Adversarial Network (GAN) architectures: DCGAN, ProGAN, and CycleWGAN.
Background and Motivation
Handwritten music sheets, prevalent in historical archives, pose significant challenges for digitisation due to diverse handwriting styles and poor image quality. While recent deep learning advancements have improved Automatic Music Recognition (AMR) for printed scores, handwritten scores remain problematic due to limited dataset availability. GANs emerge as a viable approach to generate synthetic training data due to their unsupervised learning capabilities.
Methodology
The study employs the CVC-MUSCIMA dataset for handwritten patterns paired with the DoReMi dataset for printed scores, facilitating a comprehensive style transfer between printed and handwritten music. Extensive data augmentation ensured a robust dataset for training. To compare GAN model performance:
- DCGAN was used as a baseline, employing convolutional layers to maintain spatial coherence, although it faced resolution and mode collapse issues.
- ProGAN employed a progressive training approach to generate higher-resolution images incrementally while incorporating Wasserstein GAN with Gradient Penalty (WGAN-GP) for enhanced training stability. Despite improvements, ProGAN struggled with symbol diversity and visual details in intricate handwriting styles.
- CycleWGAN introduced a CycleGAN variant using Wasserstein loss to improve training stability and style transfer. CycleWGAN demonstrates superior performance in generating realistic handwritten music images, although some inaccuracies persist in symbol generation.
Key Results and Analysis
Quantitative and qualitative analyses demonstrated CycleWGAN's superiority. It achieved an FID score of 41.87 and a KID of 0.05, outperforming DCGAN and ProGAN in synthesizing realistic and diverse handwritten music images. However, CycleWGAN still showed challenges with missing symbols and gaps, limiting its immediate applicability for training OMR systems.
Implications and Future Directions
This study's insights contribute significantly to advancing OMR system capabilities by providing a method to generate necessary training data. These improvements are pivotal for broader OMR applications, potentially impacting musicology and digital archiving.
Future research may enhance these models by incorporating diffusion models for detailed image generation and improving CycleWGAN training stability further. Addressing the symbolic transfer gaps using techniques like inpainting or refining training datasets could overcome current limitations, with the ultimate aim of developing a fully automated system capable of synthesizing entire handwritten music sheets.
This work underscores the potential of GANs in handwritten music synthesis and offers a blueprint for future explorations in this domain, promoting both technological advancement in AI-driven music recognition and practical applications in digital musicology.