Harmony Template Frameworks

Updated 2 July 2026

Harmony templates are structured frameworks that encode harmonic relationships using explicit data structures and graph-based representations to support tasks like melodic harmonization and orchestration.
They integrate mathematical optimization techniques, such as minimum-cost path searches and dissonance-averse sampling, ensuring smooth voice-leading and controlled transitions between chords.
Recent implementations combine hierarchical conditioning and deep learning modules, enabling adaptive, unsupervised harmonic analysis and extensible applications across various musical styles.

Harmony Template

Harmony templates provide structured, modular frameworks for analyzing, generating, or constraining harmonic material in a variety of musical and even non-musical contexts. While the term covers a spectrum of applications—including melodic harmonization, orchestral score control, unsupervised harmonic analysis, deep learning representations, and domain-general combinatorial classification—canonical templates share elements of mathematical formality, explicit encoding of harmonic relationships, and support for algorithmic tuning or extension. This article surveys key harmony template architectures for symbolic music modeling, orchestration, and harmonic analysis, referencing formal definitions, structural modules, optimization objectives, and adaptation guidelines.

1. Data Structures and Harmonic Graphs

Harmony templates frequently leverage explicit data structures encoding sets of possible chords (regions, chord-types) and their relationships. In "A System for Melodic Harmonization using Schoenberg Regions, Giant Steps, and Church Modes," the core structure is an undirected, weighted graph $G=(V,E)$ , with nodes $v\in V$ representing the 24 possible chord-regions (e.g. I, ii, V across keys) and edges encoding inter-chordal relationships classified as Type-a (direct/close), Type-b (indirect/close), Type-c (indirect/remote), or Type-d (distant). Edge weights $w_{ij}$ are inversely proportional to the transition smoothness between $v_i \to v_j$ , controlling both voice-leading and region-centric traversal (Fernandes, 5 Jan 2025).

In hierarchical orchestration, SymphonyGen utilizes a quantized "harmony skeleton" indexed over the music’s bar–beat grid, storing a binary vector $h_b \in \{0,1\}^{128}$ at each beat, which encodes eligibility of each MIDI pitch as a harmonic tone (chord or extension) for that window (He et al., 28 Apr 2026). The agreement between the skeleton and resulting texture is ensured during generation through a cross-attention and logit-adjustment framework.

Unsupervised harmonic analysis, as in neural HSMMs with code-quality templates, decouples chord identity into discrete root and quality classes; each pair $(r,q)$ has an associated emission template $\mathbf{v}_{pc|q,r}$ , governing probabilistic pitch-class activations via Bernoulli parameters, and transitions are regulated by learned Markov processes over key, root, and duration (Uehara, 2024).

2. Mathematical Optimization and Cost Functions

Harmony templates embed harmonic decision-making within explicit optimization problems. The graph-based template formalizes chord selection as a minimum-cost path search; cost for each step combines a voice-leading/dissonance term $D(c_{t-1},c_t)$ and a relationship (region-smoothness) term $\lambda \cdot w_{ij}$ , with $\lambda\in \mathbb{R}_+$ controlling the relative preference for smooth regional movement versus local dissonance minimization (Fernandes, 5 Jan 2025). Viterbi or beam search over the region-graph with these locally computed costs yields globally optimal or near-optimal progressions; optional stochastic inserts (e.g., secondary dominants or $v\in V$ 0– $v\in V$ 1– $v\in V$ 2 chains with probability $v\in V$ 3, $v\in V$ 4) extend the candidate set for jazz or non-classical idioms.

SymphonyGen applies dissonance-averse sampling by precomputing a Plomp–Levelt sensory distance matrix $v\in V$ 5; at each step, incremental dissonance $v\in V$ 6 is computed for candidate pitch $v\in V$ 7 relative to the current skeleton and predicted non-harmonic notes, and the corresponding score subtracted from the model logit before softmaxing (He et al., 28 Apr 2026). This results in probabilistically constrained sampling that nevertheless allows expressive non-chord tones.

In unsupervised harmonic analysis, the generative objective is the negative log-marginal likelihood over all possible chord/root/key assignments and observed pitch-class activations, optimized either by the EM algorithm or direct gradient descent (Uehara, 2024). Chord quality emission probabilities are modularly encoded via shiftable, root-dependent templates.

3. Hierarchical and Contextual Conditioning

Recent deep symbolic music models integrate harmony templates into hierarchical tokenized representations, as exemplified in the Harmony-Aware Transformer (HAT) (Zhang et al., 2021). Chord and phrase identities are represented as discrete-event fields within each token. HAT inserts intermediate "Texture" and "Form" Transformer modules: the former conditions on chord-dense windows inside phrases to encode local harmonic context; the latter pools phrase-level embeddings for long-range structure (form). Chord and phrase tokens are updated recursively using the previous phrase’s global form embedding and the local chord texture of predecessor tokens.

In SymphonyGen, hierarchical conditioning occurs along three axes—bar, track, event. Separate token tensors are computed for harmony events and general music events. Cross-attention layers alternately condition texture generation on the harmony skeleton and enforce intra-track continuity across bars for orchestration-scale context (He et al., 28 Apr 2026).

4. Algorithmic Pipeline and Stepwise Implementation

Comprehensive harmony templates specify end-to-end pipelines from raw input to harmonized output:

Melodic Harmonization (Fernandes, 5 Jan 2025):

Audio or MIDI melodic input is processed by deep CNN-based or algorithmic pitch extractors; output quantized to note-events.
For each event, candidate chord regions compatible with the given melody note are selected, filtering by mode and adjacency constraints.
Costs for each pair of consecutive chords are computed, combining voice-leading distances and graph-defined region weights.
Dynamic programming or beam search yields the globally optimal chord sequence.
Chord and melody are synthesized for playback.

Skeleton-Guided Orchestration (He et al., 28 Apr 2026):

Input MIDI or token sequence is quantized to a fixed bar/beat grid.
For each beat, the active polyphonic content is reduced to a template-matched chord, and non-clashing extensions are added greedily.
The resulting binary skeleton vectors are used as conditioning for hierarchical decoders, which are refined during inference using dissonance-penalizing adjustments and post-hoc reinforcement with audio-based rewards.

Unsupervised Harmonic Analysis (Uehara, 2024):

Observed pitch-class vectors are segmented.
Hidden semi-Markov model assigns most likely root, mode, quality to each segment, via learned transition and emission matrices constructed from chord-quality templates.
Tonic identification is performed via the stationary distribution of root transitions; the importance of each root is extracted from the long-run dynamics.

5. Adaptability and Extension Across Styles and Systems

Harmony templates are explicitly designed for extensibility:

In the region-graph model, new chord types (added as nodes), transitions (edges), and region relationships (weight heuristics) can be incorporated to support jazz, pop, atonal, or microtonal idioms. Adjustment of $v\in V$ 8, $v\in V$ 9, $w_{ij}$ 0, and neighborhood structures enable idiomatic adaptation (Fernandes, 5 Jan 2025).
The "harmony skeleton" paradigm is modular: by decoupling the extraction of beatwise harmonic outlines from final textural generation and relying only on minimal binary constraints per timestep, it enables porting to new model topologies and integrating with diverse stylistic objectives (He et al., 28 Apr 2026).
Chord-quality templates can be extended to arbitrary chord types, non-12-tone (microtonal) domains, and can be made learnable through supervised or unsupervised parameterizations, functioning as a general compositional prior over pitch-class patterns (Uehara, 2024).

6. Illustrative Example

A basic application of the region-graph harmony template to a C-major phrase C4–D4–E4–G4 yields the following harmonization:

beat 1: Cmaj7 (I)
beat 2: G7 (V7 preparing ii)
beat 3: Dm7 (ii7)
beat 4: G7 (G Lydian II), with potential mode shift to highlight sharp-4

These moves correspond to traversals along adjacent and closely related nodes in the Schoenberg region-array, integrating both functional harmony and modal coloration. Optimization terms $w_{ij}$ 1 and neighborhood size determine the degree of conventionality versus adventurousness in the output (Fernandes, 5 Jan 2025).

References

"A System for Melodic Harmonization using Schoenberg Regions, Giant Steps, and Church Modes" (Fernandes, 5 Jan 2025)
"Structure-Enhanced Pop Music Generation via Harmony-Aware Learning" (Zhang et al., 2021)
"SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton" (He et al., 28 Apr 2026)
"Unsupervised Learning of Harmonic Analysis Based on Neural HSMM with Code Quality Templates" (Uehara, 2024)