Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN

Published 25 Nov 2024 in cs.CV, cs.AI, and eess.IV | (2411.16405v1)

Abstract: The generation of handwritten music sheets is a crucial step toward enhancing Optical Music Recognition (OMR) systems, which rely on large and diverse datasets for optimal performance. However, handwritten music sheets, often found in archives, present challenges for digitisation due to their fragility, varied handwriting styles, and image quality. This paper addresses the data scarcity problem by applying Generative Adversarial Networks (GANs) to synthesise realistic handwritten music sheets. We provide a comprehensive evaluation of three GAN models - DCGAN, ProGAN, and CycleWGAN - comparing their ability to generate diverse and high-quality handwritten music images. The proposed CycleWGAN model, which enhances style transfer and training stability, significantly outperforms DCGAN and ProGAN in both qualitative and quantitative evaluations. CycleWGAN achieves superior performance, with an FID score of 41.87, an IS of 2.29, and a KID of 0.05, making it a promising solution for improving OMR systems.

Abstract PDF HTML Upgrade to Chat

Summary

The paper evaluates three GAN architectures (DCGAN, ProGAN, CycleWGAN) for their effectiveness in synthesising handwritten music sheets to address data scarcity for Optical Music Recognition (OMR).
Among the evaluated models, CycleWGAN demonstrated superior performance in generating realistic handwritten music images, achieving better FID and KID scores than DCGAN and ProGAN.
This research provides valuable insights for improving OMR systems by enabling synthetic data generation and highlights areas for future work, such as integrating diffusion models or refining symbol transfer.

Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN

The paper "Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN" addresses a significant gap in the domain of Optical Music Recognition (OMR). The authors tackle the challenge of generating synthetic handwritten music sheets, essential for the training and improvement of OMR systems. Given the scarcity of annotated datasets for handwriting recognition in music, this research evaluates the effectiveness of three distinct Generative Adversarial Network (GAN) architectures: DCGAN, ProGAN, and CycleWGAN.

Background and Motivation

Handwritten music sheets, prevalent in historical archives, pose significant challenges for digitisation due to diverse handwriting styles and poor image quality. While recent deep learning advancements have improved Automatic Music Recognition (AMR) for printed scores, handwritten scores remain problematic due to limited dataset availability. GANs emerge as a viable approach to generate synthetic training data due to their unsupervised learning capabilities.

Methodology

The study employs the CVC-MUSCIMA dataset for handwritten patterns paired with the DoReMi dataset for printed scores, facilitating a comprehensive style transfer between printed and handwritten music. Extensive data augmentation ensured a robust dataset for training. To compare GAN model performance:

DCGAN was used as a baseline, employing convolutional layers to maintain spatial coherence, although it faced resolution and mode collapse issues.
ProGAN employed a progressive training approach to generate higher-resolution images incrementally while incorporating Wasserstein GAN with Gradient Penalty (WGAN-GP) for enhanced training stability. Despite improvements, ProGAN struggled with symbol diversity and visual details in intricate handwriting styles.
CycleWGAN introduced a CycleGAN variant using Wasserstein loss to improve training stability and style transfer. CycleWGAN demonstrates superior performance in generating realistic handwritten music images, although some inaccuracies persist in symbol generation.

Key Results and Analysis

Quantitative and qualitative analyses demonstrated CycleWGAN's superiority. It achieved an FID score of 41.87 and a KID of 0.05, outperforming DCGAN and ProGAN in synthesizing realistic and diverse handwritten music images. However, CycleWGAN still showed challenges with missing symbols and gaps, limiting its immediate applicability for training OMR systems.

Implications and Future Directions

This study's insights contribute significantly to advancing OMR system capabilities by providing a method to generate necessary training data. These improvements are pivotal for broader OMR applications, potentially impacting musicology and digital archiving.

Future research may enhance these models by incorporating diffusion models for detailed image generation and improving CycleWGAN training stability further. Addressing the symbolic transfer gaps using techniques like inpainting or refining training datasets could overcome current limitations, with the ultimate aim of developing a fully automated system capable of synthesizing entire handwritten music sheets.

This work underscores the potential of GANs in handwritten music synthesis and offers a blueprint for future explorations in this domain, promoting both technological advancement in AI-driven music recognition and practical applications in digital musicology.

Markdown