Cause of performance discrepancy for training with mixed original/random data

Determine whether the observed behavior—where TFC-TDF-UNet v3 trained with an original-mix sampling probability p=0.5 performs better on training-set original mixes but worse on validation-set original mixes compared to training solely with random mixes—is caused by repeated exposure to the same original mixtures during training, leading to overfitting.

Background

The paper analyzes the effect of random mixing data augmentation on music source separation performance using the TFC-TDF-UNet v3 architecture trained on the MUSDB-HQ dataset. As part of the training dynamics paper, the authors compare models trained with different probabilities p of sampling original mixes versus random mixes.

In the training dynamics results (Fig. 2), the model trained with p=0.5 shows higher performance on training original mixes but lower performance on validation original mixes than the model trained with only random mixes. The authors explicitly conjecture that this discrepancy is due to the p=0.5 model seeing the same mixes too many times during training, suggesting potential overfitting to repeated original mixtures.

References

We conjecture this is because the $p!=!0.5$ model has seen the same mixes too many times during training, which we explore in more detail in Section \ref{sec:effective}.

— Why does music source separation benefit from cacophony? (2402.18407 - Jeon et al., 28 Feb 2024) in Section “Training Dynamics Comparison”

Cause of performance discrepancy for training with mixed original/random data

Background

References

Related Problems