Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation (2412.07948v2)

Published 10 Dec 2024 in cs.SD, cs.AI, cs.MM, and eess.AS

Abstract: In this paper we introduce the Frechet Music Distance (FMD), a novel evaluation metric for generative symbolic music models, inspired by the Frechet Inception Distance (FID) in computer vision and Frechet Audio Distance (FAD) in generative audio. FMD calculates the distance between distributions of reference and generated symbolic music embeddings, capturing abstract musical features. We validate FMD across several datasets and models. Results indicate that FMD effectively differentiates model quality, providing a domain-specific metric for evaluating symbolic music generation, and establishing a reproducible standard for future research in symbolic music modeling.

Summary

  • The paper introduces the Frechet Music Distance, a novel metric that compares multivariate Gaussian statistics of symbolic music embeddings.
  • It employs advanced music representation models like CLaMP to assess generative music quality across diverse datasets and genres.
  • Experimental validation demonstrates FMD's effectiveness in distinguishing model outputs and its sensitivity to key musical features like pitch shifts.

Frechet Music Distance: A Metric For Evaluating Generative Symbolic Music Models

The paper introduces the Frechet Music Distance (FMD), a novel evaluation metric specifically designed for generative symbolic music models. Inspired by the Frechet Inception Distance (FID) used in computer vision and the Frechet Audio Distance (FAD) applied in audio processing, FMD calculates the distance between the distributions of reference and generated symbolic music embeddings, capturing abstract musical features.

Theoretical Contribution and Methodology

The authors acknowledge the pressing challenge in symbolic music generation: the difficulty in evaluating the quality of generated musical content. Existing evaluation methods often rely on subjective assessments or simplistic statistical metrics, which cannot capture the multifaceted nature of music. In response, the Frechet Music Distance aims to provide a more objective, domain-specific metric.

FMD operates by estimating multivariate Gaussian distributions based on embeddings of both the reference and the generated music, using advanced music representation learning models such as CLaMP and CLaMP 2. The Frechet distance between these distributions is calculated using the formula:

FD=μrμt2+Tr(Σr+Σt2ΣrΣt)\mathrm{FD} = ||\mu_r - \mu_t||^2 + \text{Tr}\left(\Sigma_r + \Sigma_t - 2\sqrt{\Sigma_r \Sigma_t}\right)

Where μ\mu and Σ\Sigma represent the mean and covariance of the Gaussian distributions for the reference and test music embeddings, respectively.

Experimental Validation

The paper validates FMD using several datasets and symbolic music models, including GPT-2, FolkRNN, and MMT, and examines the metric's effectiveness under different conditions. Evaluation is conducted across diverse music genres and models trained on various datasets such as Folk V2, MAESTRO, and MidiCaps.

The results demonstrate that FMD effectively differentiates between models of varying quality and capture the stylistic nuances of generative outputs. Evident from the results, such as the low FMD values for models evaluated against their training sets and higher values for others, FMD showcases its potential in measuring generative music quality.

Sensitivity Analysis and Practical Implications

The paper performs a sensitivity analysis to assess the impact of musical properties on FMD scores, exploring the effects of mode shifts and augmentations like pitch modification. Notably, pitch changes exhibit a significant influence on the metric, highlighting FMD's sensitivity to essential musical characteristics.

The practical implications of this work are substantial. FMD offers a standardized, scalable, and reproducible measurement for evaluating generative symbolic music, advancing both the practice and academic research in the field. By providing a more nuanced assessment of musical outputs that encompasses musicality, coherence, and stylistic diversity, FMD supports the development and refinement of generative music models.

Limitations and Future Research

Despite its promising capabilities, the paper outlines several limitations. The dependence on embedding models such as CLaMP introduces potential biases based on their pre-training. Furthermore, the metric's correlation with human perception, especially for more subjective attributes of musical quality, requires further validation. Future work should explore deeper integration with temporal and structural musical features and investigate correlations with FAD for a holistic cross-modal evaluation approach.

In conclusion, the introduction of FMD marks a significant step toward improving symbolic music quality assessment. While highlighting areas for improvement and research, this work provides the foundation for future progress in generative symbolic music evaluation, setting the stage for more accurate, reliable, and consistent assessments within this evolving field.

X Twitter Logo Streamline Icon: https://streamlinehq.com