SSM-Net: feature learning for Music Structure Analysis using a Self-Similarity-Matrix based loss

Published 15 Nov 2022 in cs.SD, cs.LG, and eess.AS | (2211.08141v1)

Abstract: In this paper, we propose a new paradigm to learn audio features for Music Structure Analysis (MSA). We train a deep encoder to learn features such that the Self-Similarity-Matrix (SSM) resulting from those approximates a ground-truth SSM. This is done by minimizing a loss between both SSMs. Since this loss is differentiable w.r.t. its input features we can train the encoder in a straightforward way. We successfully demonstrate the use of this training paradigm using the Area Under the Curve ROC (AUC) on the RWC-Pop dataset.