Hookpad Aria: Generative AI for Pop Music

Updated 24 April 2026

Hookpad Aria is a generative AI system that composes Western pop songs through symbolic lead sheets and context-aware generation.
It leverages a 360M-parameter Transformer model to support left-to-right continuation, span in-filling, and melody–harmony conversion.
Its integration into the Hookpad editor and iterative data flywheel approach enable refined, user-adapted musical outputs.

Hookpad Aria is a generative AI system specifically engineered to assist musicians in composing Western pop songs through symbolic lead sheets. Integrated within the web-based Hookpad editor, it leverages Transformer-based sequence modeling to provide context-aware musical material generation. Its architecture, training strategies, and application patterns are designed around the requirements of non-sequential composition workflows, supporting capabilities such as left-to-right continuation, infilling of arbitrary spans, and cross-modal melody–harmony transformations. Hookpad Aria implements a scalable data flywheel that continually adapts to user feedback and has seen rapid adoption in collaborative music creation settings (Donahue et al., 12 Feb 2025).

1. System Architecture and Sequence Modeling

Hookpad Aria fine-tunes the 360M-parameter Anticipatory Music Transformer (AMT) pre-trained on multi-instrument symbolic music. The data representation for lead sheets comprises two principal voices—melody ( $\mathcal{M}$ ) and chordal harmony ( $\mathcal{H}$ )—augmented with a synthetic click track ( $\mathcal{C}$ ) to discretize temporal positions by beats. Each musical event, described by a tuple (start-time, duration, pitch, instrument), is assigned to either the event stream $e$ or the control stream $c$ .

Controls are shifted five seconds earlier, enabling the model to anticipate future musical events and to fill temporal gaps in the sequence. The interleaved sequence of events and shifted controls is modeled with standard causal self-attention:

$\text{Attention}(Q,K,V) = \mathrm{softmax}(QK^\top/\sqrt{d})V$

The factorized autoregressive objective is formalized as:

$P_\theta(e|c) = \prod_{t=1}^T P_\theta(e_t | e_{<t}, c_{\leq t})$

with parameters $\theta$ optimized by cross-entropy minimization:

$\mathcal{L}(\theta) = -\sum_{(e,c)\in D} \log P_\theta(e | c)$

2. Generation Capabilities and Modes

Hookpad Aria supports three principal generation capabilities, each corresponding to distinct conditioning and output partitioning:

Left-to-Right Continuation: Given a context span $[t_s, t_e]$ , generates events from $\mathcal{H}$ 0 to $\mathcal{H}$ 1 by maximizing

$\mathcal{H}$ 2

Fill-in-the-Middle (Span In-Filling): Given context before $\mathcal{H}$ 3 and after $\mathcal{H}$ 4, infills the span $\mathcal{H}$ 5, by

$\mathcal{H}$ 6

with future events $\mathcal{H}$ 7 used as controls.

Melody–Harmony Conversion: By partitioning $\mathcal{H}$ 8 and $\mathcal{H}$ 9 as events/controls, enables harmonization from melody or melody generation from harmony, using the same probabilistic objective.

The correspondence between capabilities and event/control assignments is summarized below.

Capability	Events ( $\mathcal{C}$ 0)	Controls ( $\mathcal{C}$ 1)
Left-to-right	$\mathcal{C}$ 2 (before $\mathcal{C}$ 3)	$\mathcal{C}$ 4 (full)
Fill-in-the-middle	$\mathcal{C}$ 5	$\mathcal{C}$ 6
Harmony $\mathcal{C}$ 7Melody	$\mathcal{C}$ 8	$\mathcal{C}$ 9
Melody $e$ 0Harmony	$e$ 1	$e$ 2

3. Training Data, Preprocessing, and Augmentation

The primary pretraining source consists of approximately 50,000 publicly available lead sheets (about 200 million tokens) from TheoryTab. Functional chord symbols are mapped into four-note MIDI chord voicings; melody lines are single-pitch sequences. Fine temporal discretization is achieved by snapping all positions to 16th-note subdivisions and incorporating a single percussive click note per beat in the synthetic click track.

For model robustness and coverage, each lead sheet is used to sample multiple random spans and target modes, producing roughly ten training examples per source and covering segment lengths from one to eight measures. This data augmentation paradigm ensures the model experiences diverse contexts and conditioning modes during training (Donahue et al., 12 Feb 2025).

4. Real-time Integration and Workflow in the Hookpad Editor

Hookpad Aria is embedded directly into Hookpad’s lead-sheet web interface, supporting non-sequential workflow patterns:

Region selection: Users highlight one or more measures in the score.
Mode selection: “Continue,” “Fill,” “Harmonize,” and “Melodify” buttons correspond to different generative modes.
API communication: The frontend transmits metadata ({projectID, key, tempo, meter, selected span, mode}) to the Aria backend, which responds with $e$ 3– $e$ 4 candidate completions (formatted as JSON-encoded sequences of beat, pitch, duration, and voice).
Suggestion rendering: Candidates are displayed in a scrollable panel; users can audition and “Accept” a suggestion to commit it to the editor.
Context conditioning: As every generation request may involve arbitrary user-selected spans, the workflow supports both local and global edits, including mid-song modification without imposing sequential constraints.

This pattern aligns with iterative, fragmentary, and exploratory songwriting methodologies prevalent in Western pop composition.

5. Usage, Adoption Metrics, and Qualitative Evaluation

Since its public release in March 2024, Hookpad Aria has recorded:

318,000 generative requests
Approximately 3,000 unique users
74,000 accepted suggestions, yielding an accept rate of approximately 23%

Comprehensive logging of user interactions, including time, mode, and project metadata, enables detailed offline evaluation. Eight one-hour semi-structured interviews revealed that end users primarily regard Aria as an “ideation partner,” leveraging it to unblock creative processes and favoring its short, reusable output snippets. Users suggested that finer high-level controls—e.g., explicit manipulation of genre, mood, or song section labeling—would extend Aria’s flexibility (Donahue et al., 12 Feb 2025).

6. Iterative Model Adaptation via Data Flywheel

Hookpad Aria’s feedback loop centers on accepted suggestions as positive reinforcement signals. Each month, user-validated (context, suggestion) pairs are aggregated and used to fine-tune a new release of the medium-sized AMT model. This “data flywheel” approach incrementally encodes evolving user stylistic preferences and emergent lead-sheet conventions, enabling model adaptation in response to real-world adoption and musical trends.

This suggests a continually improving co-creation platform in which preferences implicit in accepted musical fragments are reflected in subsequent model generations.

7. Illustrative Generation Examples

For melody harmonization, if a user sketches a two-measure melody in C major and selects “Harmonize,” Aria might return:

Suggestion 1: Cmaj7 | Am7 | D7 | G7 (chords on beats 1 and 3)
Suggestion 2: C6 | Em7 | Am7 | D7 G7

During generation, the event stream contains melody notes for the specified bars, while controls include the click track and any specified harmony for infilling objectives. For left-to-right continuation, a user requesting continuation of a four-bar melody {E₄–G₄–C₅–B₄, …} might receive, for example:

Bar 5: G₄ quarter, A₄ eighth, G₄ eighth, E₄ half
Bar 6: F₄ eighth, G₄ eighth, E₄ quarter, C₄ half

The output is integrated one measure at a time, supporting both the completion and transformation of existing song material (Donahue et al., 12 Feb 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Hookpad Aria: A Copilot for Songwriters (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hookpad Aria.