Consonance-Informed Distance Metric

Updated 4 September 2025

Consonance-informed distance metric is a similarity measure that integrates perceptual musical consonance and harmonic structure into standard distance calculations.
It employs increasing, concave metric-preserving functions on correlation measures to ensure filtering invariance and compliance with the triangle inequality.
The approach underpins applications in music information retrieval, spectral estimation, and chord annotation by aligning computational models with human auditory perception.

A consonance-informed distance metric quantifies similarity or dissimilarity between objects—such as audio signals, harmonic annotations, or feature vectors—by explicitly embedding perceptual musical consonance or harmonic structure into the definition of distance. It extends conventional similarity-based and distance-based measures to account for harmonic relationships, auditory perceptual mappings, and the special role of consonance in musical and acoustic analysis.

1. Theoretical Foundation: Metric-Preserving Functions and Similarity Transformations

Consonance-informed distance metrics often begin with transformations of correlation or similarity coefficients, such as cosine similarity or Pearson correlation, into valid metric distances. A crucial tool is the use of metric-preserving functions $f$ satisfying:

$f(0) = 0$
$f$ is increasing
$f$ is concave on $x>0$

Concavity ensures subadditivity ( $f(a+b) \leq f(a)+f(b)$ for $a, b \geq 0$ ), preserving the triangle inequality. For example, transforming the angular distance $\theta = \arccos(A(x, y))$ for a similarity measure $A$ yields $f(\theta)$ as a metric. This framework enables two salient transformation classes:

Transformation Class	Example Formula	Treatment of Anti-correlation
Maximal Separation	$d_1(x, y) = \arccos(A(x, y))$ <br> $d_2(x, y) = \sqrt{1 - A(x, y)}$	Anti-correlated objects are maximally distant.
Consonance-Informed/Collated	$d_3(x, y) = \pi - \|\pi - \arccos(A(x, y))\|$ <br> $d_4(x, y) = \sqrt{1 - A(x, y)^2}$	Correlated and anti-correlated objects are equated; metric is “folded” at $\pi/2$ .

The second class assigns equal distance to pairs that are strongly correlated or strongly anti-correlated, capturing the “absolute value” of association magnitude. Application of metric-preserving functions—including sine transformations—on centered data is especially effective, yielding $d(x, y) = \sin(\theta) = \sqrt{1 - A(x, y)^2}$ .

2. Conal Metrics and Explicit Geodesics in Spectral Spaces

Consonance-informed metrics are extended to the space of spectral densities by generalizing Thompson and Hilbert metrics (Baggio et al., 2017). For spectral densities $\Phi_1, \Phi_2$ :

$M(\Phi_1, \Phi_2) = \inf \{ \lambda\,:\,\Phi_1(e^{j\theta}) \leq \lambda \Phi_2(e^{j\theta})\ \forall\theta \}$
$m(\Phi_1, \Phi_2) = 1 / M(\Phi_2, \Phi_1)$

Defining distances:

$d_H(\Phi_1, \Phi_2) = \log \frac{M(\Phi_1, \Phi_2)}{m(\Phi_1, \Phi_2)},\quad d_T(\Phi_1, \Phi_2) = \log\max\{M(\Phi_1, \Phi_2), M(\Phi_2, \Phi_1)\}$

When working with rational spectral densities, the metric can be computed using minimum-phase spectral factors $W_1, W_2$ via $M(\Phi_1, \Phi_2) = \| W_2^{-1} W_1 \|_{H^\infty}^2$ . This results in an efficient and explicit geodesic structure within a Finsler manifold framework. The geodesic path for interpolation between $\Phi_1, \Phi_2$ is given explicitly and remains within the rational space, an essential property for implementable models.

3. Filtering Invariance and Harmonic Structure

Consonance-informed metrics in spectral spaces possess a filtering invariance property. For any invertible filter $T$ :

$d(\Phi_1, \Phi_2) = d(T\Phi_1T^*, T\Phi_2T^*)$

This ensures that distance is unaffected by congruent transformations, focusing the metric on the intrinsic shape or “timbre” of the spectral content. Filtering invariance is crucial for robust spectral estimation and comparison, as it insulates the metric from preprocessing effects and preserves the musical consonance inherent in the signals.

4. Perceptual Mapping and Riemannian Metrics

Consonance-informed distance in perceptual domains requires mapping high-dimensional stimuli into Riemannian manifolds representing auditory perception (Oh et al., 2020). The mapping $f:M\rightarrow N$ (from stimulus space $M$ to perceptual space $N$ ) induces a metric tensor $g' = J^\top J$ , where $J$ is the Jacobian of $f$ . The perceptual line element:

$d\sigma_N^2 = dx^\top (J^\top J) dx$

The Riemannian distance between stimuli $r,s$ :

$d_f(r, s) = \inf \left\{ \int_{t_1}^{t_2} \sqrt{ \frac{dx}{dt}^\top g'(x) \frac{dx}{dt} } dt \right\}$

Empirical findings reveal that such metrics yield higher correlations ( $r \approx 0.8$ ) with subjective ratings than plain vector-space (Euclidean) approaches, notably outperforming L2 and matching or surpassing advanced methods like PEAQ, PEMO-Q, and ViSQOLAudio. Biologically inspired and hybrid mappings (e.g., those emulating the basilar membrane) play a central role in perceptual consonance modeling.

5. Consonance-Informed Metrics in Music Information Research

Recent developments (Poltronieri et al., 1 Sep 2025) demonstrate the utility of consonance-informed metrics in music annotation and chord estimation. Standard mechanical distance assigns equal penalty to all semitone errors, but the consonance-informed (Mechanical-Consonance) metric weights semitone deviations according to perceptual studies:

$vt = [0, 7, 5, 1, 1, 2, 3, 1, 2, 2, 4, 6]$

Intervals such as perfect fifths or thirds receive lower weights (indicating high consonance), while tritones and seconds are penalized more heavily. This weighting produces metrics that correlate more closely with musically meaningful annotation agreements and distinctions.

Within conformer-based models for Audio Chord Estimation (ACE), consonance-based label smoothing further leverages the consonance vector. Mapping for pitch class $t$ yields a smoothed distribution $q$ using similarity scores derived from the consonance weights, rewarding predictions with harmonic proximity. Decomposition of chord labels into root, bass, and note activations, combined with hierarchical reconstruction, addresses class imbalance and facilitates musically plausible learning.

6. Practical Implications and Applications

Consonance-informed metrics support clustering, nearest-neighbor search, indexing structures, and computational pruning in high-dimensional data analysis (Dongen et al., 2012). They are essential in music information retrieval, audio engineering (codec optimization, restoration), spectral estimation, and perceptual modeling. By weighting errors and similarities in accordance with musical and perceptual principles, these metrics enable refined evaluation, robust learning, and musically aware system design.

The role of explicit harmonic, geometric, and perceptual transformation—in conjunction with metric-preserving functions and filtering invariance—underpins a spectrum of modern applications: speech morphing, perceptual audio quality assessment, rational spectral interpolation, and advanced music information retrieval. Enhanced sensitivity to consonance allows systems to distinguish “less wrong” predictions (e.g., near-miss chord labels) and support annotation agreement evaluation on a perceptual, rather than strictly symbolic, basis.

7. Comparative Perspective and Ongoing Directions

Consonance-informed metrics depart from naïve measures such as $d(x, y) = 1 - A(x, y)$ , which lack triangle inequality preservation. Transformations via increasing and concave functions ensure metric properties, and distance folding mechanisms (as in the absolute correlation distance) allow uniﬁed treatment of correlated and anti-correlated vectors. The extension to filtering invariant, spectrally aware, and perceptually mapped domains demonstrates the adaptability of the approach.

Future avenues involve refining empirical consonance vectors, advancing perceptual mapping techniques to capture context-dependent auditory effects, and integrating these metrics into broader machine learning and music cognition frameworks. The capacity to induce meaningful, musically and perceptually significant structure on data spaces is central to ongoing research in this domain.