Papers
Topics
Authors
Recent
Search
2000 character limit reached

MoisesDB: Multi-Stem Music Dataset

Updated 17 March 2026
  • MoisesDB is a publicly available multitrack dataset offering high-fidelity audio, granular metadata, and a hierarchical stem taxonomy that reflects real-world studio mixing.
  • It comprises 240 songs across 12 genres with a two-level annotation system, enabling both coarse-grained and fine-grained source separation analyses.
  • The dataset provides a dedicated Python API and standardized benchmark protocols, facilitating reproducible experiments and advanced system evaluations.

MoisesDB is a publicly available, high-fidelity multitrack dataset designed to advance research in musical source separation by providing granular annotations and a flexible stem taxonomy that goes beyond the typical four-stem (vocals, drums, bass, other) paradigm. Comprising 240 songs from 47 artists across 12 popular genres, MoisesDB delivers raw instrument tracks, semantically labeled stems in a two-level hierarchy, and a complementary Python API, thereby establishing itself as a comprehensive resource for evaluating and building fine-grained source separation systems (Pereira et al., 2023).

1. Dataset Composition and Metadata

MoisesDB contains 240 songs with a total duration of 14 hours, 24 minutes, and 46 seconds; the average track length is 3 minutes and 36 seconds (±66 seconds). The tracks span 12 “high-level” genres—such as Pop, Rock, Jazz, Hip-Hop, R&B, Electronic, Folk, Latin, Reggae, Blues, Classical, and Other—with a power-law distribution in genre and artist representation. Each song includes a JSON metadata file describing artist, title, genre, duration (in seconds), sampling rate, raw recorded tracks, and the mapping of these tracks to high-level stems.

The dataset's design reflects the varied nature of recorded music: instrumentation diversity means the number and combination of stems differ from song to song. Common stems (e.g., “vocals,” “drums,” “bass”) are present in nearly every track; rare categories, such as “wind,” have minimal representation, underlining stem imbalance.

2. Two-Level Hierarchical Stem Taxonomy

Each song in MoisesDB includes both raw audio sources and their grouping into a two-level stem hierarchy that mirrors studio mixing workflows. The Level 1 stem categories and representative Level 2 track types appear as follows:

Level 1 Category Example Level 2 Types Typical Occurrence
Bass Bass Guitar, Bass Synthesizer, Contrabass Nearly all songs
Bowed Strings Cello, Viola Section, String Section Variable
Drums Snare Drum, Kick Drum, Drum Machine Nearly all songs
Guitar Acoustic, Electric (Clean/Distorted) Frequent
Other Fx Occasional
Other Keys Organ, Electric Organ, Synth Lead Variable
Other Plucked Banjo, Mandolin, Ukulele, Harp Rare
Percussion A-Tonal, Pitched Percussion Frequent
Piano Grand Piano, Electric Piano Frequent
Vocals Lead Female/Male, Background, Other Nearly all songs
Wind Brass, Flutes, Reeds, Other Wind Very rare

The taxonomy enables evaluation and modeling at arbitrary granularity, supporting both coarse-grained and fine-grained source separation scenarios. The number of stems per song is non-uniform, requiring dynamic strategies for model training and evaluation.

3. Data Formats, Organization, and Naming Conventions

All audio material is un-mastered, delivered as stereo WAV files at 44.1 kHz sampling rate, with no compression or heavy processing. The directory structure for each track is standardized:

1
2
3
4
5
6
7
8
9
10
11
12
data_path/
├─ track_XXXX/
│   ├─ metadata.json          # Metadata and mappings
│   ├─ mixture.wav            # Final stereo mix
│   ├─ stems/
│   │   ├─ vocals.wav
│   │   ├─ drums.wav
│   │   └─ …
│   └─ sources/
│       ├─ Snare_Drum.wav
│       ├─ Kick_Drum.wav
│       └─ …

Stems are named with ASCII-safe identifiers (e.g., “vocals.wav,” “guitar.wav”), while individual source tracks reflect Level 2 typing (e.g., “Flutes.wav”). This organization facilitates flexible data access, robust labeling, and reproducibility in experimental protocols.

4. Python API and Data Access

A dedicated Python package, “moisesdb” (available at https://github.com/moises-ai/moises-db), supports downloading, preprocessing, and dataset manipulation. Installation is via pip install moisesdb. The API allows access to track-level metadata, mixture and stem audio as NumPy arrays, automated mixing of raw sources, track iteration, and on-disk stem saving. Example usage:

1
2
3
4
5
6
from moisesdb.dataset import MoisesDB
db = MoisesDB(data_path='./moises-db-data')
track = db[0]
mixture = track.audio             # (2, N) numpy array
stems = track.stems               # { 'vocals': array, 'drums': array, ... }
track.save_stems('./output/track_0')

This structure supports traditional experimentation as well as new use cases requiring dynamic source grouping or re-mixing.

5. Baseline Evaluation Protocol and Benchmarks

MoisesDB provides standardized reference protocols for benchmarking source separation systems. Evaluations are conducted at 4-stem, 5-stem, and 6-stem granularity. Only tracks containing at least the required stem set are included (N=235 for 4-stem, N=104 for 5-stem, N=88 for 6-stem). Surplus stems are linearly summed into an “other” category.

System performance is measured using the BSS-Eval Source-to-Distortion Ratio (SDR) metric:

SDR=10log10(ns(n)2+ϵns(n)s^(n)2+ϵ)\textrm{SDR} = 10 \cdot \log_{10} \left( \frac{\sum_n |s(n)|^2 + \epsilon}{\sum_n |s(n) - \hat{s}(n)|^2 + \epsilon} \right)

Here, s(n)s(n) is the reference source, s^(n)\hat{s}(n) is the estimate, and ϵ\epsilon is a small constant.

Oracle methods benchmarked:

  • Ideal Binary Mask (IBM)
  • Ideal Ratio Mask (IRM)
  • Multichannel Wiener Filter (MWF)

Open-source models evaluated:

  • Spleeter (4- and 5-stem configurations)
  • Hybrid-Transformer Demucs (HT-Demucs; 4- and 6-stem configurations)

Key results:

Configuration Model SDR (mean ± std, median, dB)
4-Stem HT-Demucs 9.91 ± 3.27 (9.69 med)
4-Stem MWF 9.08 ± 2.15 (8.87 med)
4-Stem IRM 8.97 ± 2.16
4-Stem IBM 7.14 ± 2.28
4-Stem Spleeter 6.29 ± 2.47
5-Stem MWF 7.81 ± 2.66 (7.83 med)
5-Stem IRM 7.65 ± 2.66
5-Stem IBM 5.12 ± 2.81
5-Stem Spleeter 4.66 ± 3.20
6-Stem MWF 7.06 ± 2.73
6-Stem IRM 6.91 ± 2.70
6-Stem IBM 5.12 ± 2.81
6-Stem HT-Demucs 6.24 ± 5.17

Two notable patterns emerge: (1) IRM and MWF oracle methods show similar performance across stem groupings, and (2) HT-Demucs surpasses oracle methods on “bass” and “drums” stems in the 4- and 6-stem settings, underscoring the efficacy of modern deep learning architectures for certain instrument categories.

6. Insights, Limitations, and Research Utility

MoisesDB’s stem taxonomy, which is aligned with real-world mixing processes (raw tracks → stems → mixture), enables detailed analysis of error patterns and the construction of highly granular separation models. The inherently imbalanced distribution of stems—especially the scarcity of “wind” or specific “plucked” instrument tracks—poses data sufficiency challenges for rare-instrument modeling and potentially affects generalization.

Since all audio is un-mastered and possesses higher dynamic range (DR14) and lower loudness (LUFS) than typical commercial releases, there may be a domain shift when applying models trained on MoisesDB to mastered material.

Baseline findings indicate that state-of-the-art deep learning systems such as HT-Demucs now rival or even outperform best-case oracle masking techniques (IRM, MWF) on certain stems but experience performance drop-off as more granular separation is requested—especially for less-represented instrument types such as piano and guitar.

7. Summary and Significance

MoisesDB provides, for the first time, a multitrack dataset with a structured, hierarchical taxonomy for stems, supporting up to 12 instrument classes and enabling research in arbitrarily fine-grained source separation. With comprehensive annotations, unprocessed high-dynamic-range audio, and a dedicated Python API, MoisesDB addresses the limitations of four-stem datasets and establishes itself as an important benchmark for evaluating traditional and modern source separation systems at multiple granularities (Pereira et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MoisesDB.