Papers
Topics
Authors
Recent
Search
2000 character limit reached

BeatlesFC: Annotated Beatles Harmony Corpus

Updated 7 January 2026
  • BeatlesFC is a curated corpus that assigns harmonic function labels (T, PD, D) to nearly every Beatles song, linking chord events to global musical form.
  • The dataset was rigorously annotated by expert music theorists using Sonic Visualizer and detailed protocols, ensuring precise alignment of chord events and function labels.
  • The corpus supports MIR tasks and computational analyses by providing empirical data on rock harmony, facilitating automatic phrase boundary detection and statistical modeling.

BeatlesFC is a large-scale, manually annotated corpus providing harmonic function labels—tonic (T), predominant (PD), and dominant (D)—for nearly the entire Isophonics Beatles dataset. These annotations formally link chord events to higher-level functional and structural interpretations, serving as a bridge between chordal surface and global musical form in Beatles’ recorded works. BeatlesFC combines systematic analytical rigor with computational accessibility, targeting downstream applications in music information retrieval (MIR), rock harmony research, and statistical modeling of popular-music syntax (Sim et al., 5 Jan 2026).

1. Harmonic Function Definition and Theoretical Basis

BeatlesFC utilizes a three-class harmonic-function taxonomy:

  • TT (tonic; “stable”)
  • PDPD (predominant; “unstable”, leads toward dominant)
  • DD (dominant; “maximally unstable”, resolves to tonic)

Function assignment is realized via a labeling function:

f ⁣:{chords}    {T,PD,D}.f\colon \{\text{chords}\}\;\longrightarrow\;\{T, PD, D\}.

The core syntactic assumption, termed the “functional circuit”, posits that within a musical phrase, harmonic progression operates according to rock-specific prolongation cycles:

TPDDTT \longrightarrow PD \longrightarrow D \longrightarrow T

or, more generally,

T(PDD)TT \longrightarrow (PD \cup D)^* \longrightarrow T

where ^* denotes any sequence (including empty) of PDPD and DD. Notably, dominant function (DD) may be instantiated by chords beyond the classical VV chord (e.g., back-related VV, neighboring dominant-prolonging sonorities). This function-labeling system diverges from strict Roman-numeral analysis by explicitly modeling the role and prolongation of harmonic regions in the idiom of rock.

2. Annotation Workflow and Methodology

The annotations were generated by two PhD-level music theorists, each with extensive training in harmonic analysis and popular-music theory. The annotation protocol involved:

  1. Starting from the Isophonics Beatles dataset (timed chords, keys, segments).
  2. Using Sonic Visualizer for audio–annotation alignment.
  3. Assigning a function label f(ci){T,PD,D}f(c_i)\in\{T,PD,D\} for each chord event cic_i according to its onset tiont_i^{\mathrm{on}} and offset tiofft_i^{\mathrm{off}}.
  4. Interpreting chord function relative to local key label (e.g., A major in E major may serve as PDPD).
  5. Identifying prolongation techniques (neighboring/back-related chords), labeling prolongations by the function they extend.
  6. Preserving parallel labels (“–A” and “–B”) for ambiguous or disputed cases; 26 songs received dual annotation for inter-annotator comparison.

Annotation files are stored as plain-text “.lab” files with the tabular triplet: onset, offset, function label. Example line:

1
0.00  1.23  T

3. Dataset Structure, Coverage, and Exemplars

BeatlesFC is organized as follows:

1
2
3
4
/beatlesFC/
├─ chord/      (Isophonics chord .lab files)
├─ key/        (Isophonics key .lab files)
└─ function/   (BeatlesFC function .lab files)

  • 179 of 180 Beatles songs annotated (“Revolution 9” excluded: no stable key)
  • 14,132 chord events received T/PD/DT/PD/D function labels

An excerpt (measures 1–8, “I Saw Her Standing There”) illustrates the annotation paradigm:

Measure Chord Roman Numeral Function
1 E I T
2 E I T
3 A IV T (neighbor prolongation)
4 E I T
5,6 E I T
7 A IV T
8 B V T (back-related V)

Parallel “–A” and “–B” function files are provided for configurations where annotators disagreed.

Label distribution across the complete dataset:

Function Count % of Total
T 9,941 70.3%
PD 2,326 16.5%
D 1,865 13.2%
Total 14,132 100%

This distribution substantiates the empirical dominance of tonic prolongation in Beatles’ rock harmony, with relatively fewer utterances of predominant and dominant functions. The observed prevalence supports the “functional circuit” hypothesis as a characteristic feature of the Beatles’ harmonic language (Sim et al., 5 Jan 2026).

5. Relationship to Form and Analytic Utility

BeatlesFC’s function annotations operate at the phrase level: each chord’s function is contextualized within the surrounding segment. This configuration builds a direct bridge between:

  • Lower-level labels (chord events, beat-level data)
  • Higher-level form (formal sections, cadences, verses, bridges)

Key analytic and computational applications include:

  • Automatic phrase boundary and cadence detection: Stable versus unstable boundaries are precisely marked.
  • Segment clustering by harmonic profile: Enabling analyses such as locating all tonic-prolongation passages or comparing function-region densities across songs.
  • Statistical and machine-learning models of rock harmony: BeatlesFC’s explicit intermediate function layer provides a target for syntax-learning and musicological feature engineering.
  • Quantitative cross-dataset comparisons: Enabling MIR tasks such as style analysis, key modulation detection, and automated Roman-numeral realization in comparison to other annotated pop/rock corpora.

6. Data Access, Format, and Preservation of Analytical Diversity

Every BeatlesFC annotation is provided in an open, human-readable format. Parallel labeling (dual “–A”/“–B” outputs) preserves analytical uncertainty and inter-rater diversity, supporting robust downstream evaluation of analysis-dependence in computational models. The corpus is positioned to catalyze further work in both rule-based and data-driven harmonic analysis paradigms.

BeatlesFC introduces an analytically validated, phrase-level harmonic function layer for one of the most comprehensively studied popular-music corpora. By bridging surface chord events and global form, the data set unlocks numerous MIR and stylistic-analysis applications, and establishes a model for future function-annotated datasets in computational musicology (Sim et al., 5 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BeatlesFC.