Tonal Interval Vector Analysis

Updated 1 December 2025

Tonal Interval Vector Analysis is a method that represents musical tonal content as a six-element complex DFT output derived from normalized weighted chroma vectors.
It computes perceptually relevant descriptors like chromaticity, diatonicity, dissonance, and harmonic change to support tasks such as key estimation and chord tracking.
The approach enables both instantaneous and global extraction for real-time audio analysis and symbolic music generation, ensuring computational efficiency.

The Tonal Interval Vector (TIV) is a vectorial representation of the tonal content of musical material—with applications spanning content-based analysis, music information retrieval (MIR), and computational music generation. The TIV formalism encodes the intervallic makeup and tonic orientation of chords, scales, or chroma vectors using the first six coefficients of the complex Discrete Fourier Transform (DFT) of a normalized chroma profile. This representation enables the quantitative extraction of numerous musically relevant descriptors, such as harmonic change, diatonicity, dissonance, and musical key, as well as the computation of scalar tonal tension metrics tailored for both audio signal and symbolic sequence analysis (Ramires et al., 2020, Ebrahimzadeh et al., 24 Nov 2025).

1. Formalism of the Tonal Interval Vector

Given a chroma or pitch-class vector describing either a short-time audio frame or a symbolic chord (e.g., $c(n), n = 0, \dots, 11$ , for the 12 pitch classes), the TIV is computed through the following steps:

Normalize the chroma vector: $\bar c(n) = c(n) / \sum_{m=0}^{11} c(m)$ .
Apply a DFT weighted by empirically derived dyad-consonance weights $w_a(k)$ , typically $w_a = [3, 8, 11.5, 15, 14.5, 7.5]$ for $k = 1 \dots 6$ , to yield

$T(k) = w_a(k) \cdot \sum_{n=0}^{11} \bar c(n) e^{-j 2\pi k n / 12}, \quad k = 1, \dots, 6.$

The resulting vector $T = (T(1), \dots, T(6)) \in \mathbb{C}^6$ is sufficient due to Hermitian symmetry for real input.

The modulus $|T(k)|$ quantifies the strength of $N/k$ -fold symmetry in the chroma profile, characterizing the prevalence of intervals (triads, diatonic sets, etc.), while the argument $\angle T(k)$ encodes the global rotation or transposition of this structure (i.e., its actual pitch anchor or tonic). For symbolic sequences, the corresponding construction (without weighting) uses $c \in \mathbb{R}^{12}$ , with each $c_n$ reflecting the presence of pitch class $n$ (Ramires et al., 2020, Ebrahimzadeh et al., 24 Nov 2025).

2. TIV Extraction: Instantaneous and Global Representations

Extraction of TIVs is performed in two primary ways:

Instantaneous TIVs: For audio, compute $c_m(n)$ (the chroma vector for frame $m$ ), normalize, and process via the above formula, typically over short-time windows (e.g., 46 ms with 50% overlap).
Global TIVs: Two approaches are standard:
1. Averaged chroma: Compute the global chroma $\bar c_{global}(n) = \text{mean}_m c_m(n)$ , normalize and DFT.
2. Energy-weighted TIVs: Compute framewise TIVs $T_m$ , then average with weights $a_m$ (e.g., framewise energy): $T_{global}(k) = \frac{\sum_m a_m T_m(k)}{\sum_m a_m}$ . The DC bin $T(0)$ , discarded in the main analysis, provides $a_m$ .

For symbolic music, the same method applies at the level of individual chords, bars, or other musical units (Ramires et al., 2020, Ebrahimzadeh et al., 24 Nov 2025).

3. TIV-Derived Descriptors and Their Musical Interpretation

The TIV representation supports a variety of musically and perceptually meaningful descriptors:

Interval Magnitudes and Phases: $|T(k)|$ (TIV.mag) and $\angle T(k)$ (TIV.phases) for $k=1 \dots 6$ ; these are invariant to transposition/inversion and correspond to:
- $k=1$ : chromaticity
- $k=2$ : dyadicity
- $k=3$ : triadicity
- $k=4$ : diminished quality
- $k=5$ : diatonicity
- $k=6$ : whole-toneness
Scalar Indices: For chromaticity, diatonicity, and whole-toneness:

$\text{chromaticity} = \frac{|T(1)|}{w_a(1)},\quad \text{diatonicity} = \frac{|T(5)|}{w_a(5)},\quad \text{whole-toneness} = \frac{|T(6)|}{w_a(6)}.$

These measure the “fit” of the pitch-class distribution to these specific intervallic templates.

Dissonance: Defined as $1 - |T(k)| / w_a(k)$ per interval class. Lower $|T(k)|$ corresponds to greater perceptual dissonance due to empirical fitting of the $w_a$ weights to dyad-consonance ratings.
Harmonic Change: Measured frame-to-frame by $\lambda_m = \|T_{m+1} - T_{m-1}\|_2$ . Large values signal harmonic or key change boundaries.
Key Estimation: TIVs of candidate keys (major/minor) are referenced:

$R_{min} = \arg\min_r \| \alpha T - T_r^p* \|_2$

Mode-bias parameter $\alpha$ is used for tuning sensitivity to mode profiles.

Tonal Tension (Symbolic Sequences): In symbolic generation, scalar tension combines:
- Chord-to-chord, chord-to-key, and chord-to-function distances (L2 and angular in $\mathbb{C}^6$ ),
- Dissonance: based on TIV norm relative to corpus maximum,
- Voice-leading tension: summed exponential function of intervallic and perceptual distances between voice pairs,
- according to a weighted sum:

$\text{Tension}(i) = D_{tonal}(i) + 30.3 \cdot D_{diss}(i) + 2.71 \cdot D_{vl}(i)$

(Ebrahimzadeh et al., 24 Nov 2025).

4. Computational Workflow and Library Support

TIV-lib supplies both offline and real-time workflows, implemented as:

Python package (numpy/scipy-based): Batch processing via TIVlib.TIV.from_pcp.
Pure Data external: Enables direct use in graphical audio environments for real-time applications.

Example Python usage:

import TIVlib as tiv
t = tiv.TIV.from_pcp(example_chroma)
chrom = t.chromaticity()
diat = t.diatonicity()
whole = t.wholetoneness()
diss_vec = t.diss()
hchg = t.hchange(prev, next)
key = t.key(profile='Temperley')

(Ramires et al., 2020)

All descriptors, including scalar and vector quantities, are member functions of a TIV object. For symbolic workflows, precomputation of TIVs for standard chords is customary (Ebrahimzadeh et al., 24 Nov 2025).

5. Core Applications in MIR and Music Generation

TIV analysis underpins various computational tasks:

Key detection: Direct nearest-neighbour classification using TIVs mapped to major/minor reference profiles.
Chord and harmonic change tracking: Detection via $\lambda_m$ .
Dissonance and diatonicity tracing for timbral or genre analysis: Exploiting the time-varying TIV magnitudes.
Harmonic mixing and cover-song detection: By cosine or Euler distances in the TIV space.
Query-by-humming: Exploiting transposition invariance of TIV.magnitudes.
Explicit Tension Conditioning in Symbolic Generation: Integration into a Transformer-based two-level beam search, where bar-level candidates are re-ranked according to the fit to a target tension curve, calculated from TIV-based metrics (Ebrahimzadeh et al., 24 Nov 2025).

Because the TIV is only six complex numbers per frame, mixture and long-term descriptors can be computed with minimal computational burden ( $\mathcal{O}(1)$ per frame).

6. Evaluation and Performance Characteristics

Studies referenced in (Ramires et al., 2020) report:

Key Estimation: TIV-based methods with adaptive mode bias outperform Krumhansl-Schmuckler baselines on standard MIR corpora.
Harmonic Mixing: TIV-based systems yield higher perceptual mix ratings over roughness- and pitch-commonality methods.
Efficiency: All TIV descriptors require only a six-point DFT and minimal vector operations per frame, supporting real-time implementation.

For symbolic generation, explicit TIV-based tension conditioning enables fine control of tonal tension contours during Transformer inference, generating outputs aligned with both model probability and user-specified tension curves (Ebrahimzadeh et al., 24 Nov 2025).

7. Significance and Prospective Directions

TIV analysis supplies an algebraically compact, perceptually grounded, and computationally tractable basis for tonal content representation, suitable for real-world MIR and music generation systems. Its flexibility enables seamless integration into diverse workflows—including real-time audio analysis and the explicit modulation of tonal tension in generative models—thus supporting applications from automatic key and chord recognition to interactive creative AI frameworks (Ramires et al., 2020, Ebrahimzadeh et al., 24 Nov 2025). A plausible implication is further adoption of TIV-based descriptors in hybrid audio-symbolic MIR systems and cross-modal generative architectures where interpretability and control of tonal structure are required.