Papers
Topics
Authors
Recent
2000 character limit reached

ToneTwist AFx Dataset for Nonlinear Audio Modeling

Updated 3 January 2026
  • ToneTwist AFx is an open corpus comprising over 40 unique audio effect devices recorded with standardized protocols for reproducible machine learning research.
  • The dataset features both analog and digital effects, offering dry/wet audio pairs in 48 kHz/24-bit WAV format across compressors, overdrive, distortion, and more.
  • Its structured metadata schema and community contribution workflow enable consistent benchmarking of black-box, gray-box, and DSP-inspired modeling approaches.

The ToneTwist AFx dataset is a comprehensive open corpus for data-driven modeling of nonlinear audio effects, with a particular focus on differentiable approaches spanning black-box and gray-box paradigms. Introduced in “Differentiable Black-box and Gray-box Modeling of Nonlinear Audio Effects” (Comunită et al., 2025) (Comunità et al., 20 Feb 2025), it encompasses a broad range of analog and digital effect devices and is designed for rigorous, problem-driven machine learning research that targets guitar amplifiers, overdrive, distortion, fuzz, compressors, and related categories. ToneTwist AFx is the first dataset in this domain to support systematic community contributions, with a detailed metadata schema, standardized recording protocols, and a workflow suitable for reproducible training and benchmarking.

1. Dataset Scope and Composition

ToneTwist AFx consists of audio input-output pairs sampled from 40 unique effect devices, encompassing:

  • Compressors & Limiter: 5 compressor units (e.g., Ampeg OptoComp, Flamma AnalogComp) and 1 hardware limiter (UA 6176 1176LN)
  • Overdrive: 6 devices, including Klon Centaur, TS9, Fulltone FD2
  • Distortion: 8 devices (Big Muff variants, HB DropKick, Rodent)
  • Fuzz: 5 devices, such as Custom Dynamic Fuzz and HB Fuzzy Logic
  • Guitar Amplifiers: 12 models (Blackstar, Fender Blues Jr, Mesa Boogie, etc.)
  • Other: Pre-amp (UA 610B), Chorus (Landlord Brewers Droop), Tremolo (Mooer Trelicopter)

Each device is probed with seven dry sources: three electric guitars, two bass guitars, a chirp sweep, and white noise. Device outputs are continuously recorded per source and segmented into 3-second blocks, yielding approximately 10,000 blocks per device and an aggregate duration of ~8 hours. All files are mono 48 kHz/24-bit uncompressed WAV. The exact file-counts and durations are provided in the dataset’s Zenodo repository.

Effect Category # Devices Example Models
Compressor 5 Ampeg OptoComp
Limiter 1 UA 6176 1176LN
Overdrive 6 Klon Centaur, TS9
Distortion 8 Big Muff, Rodent
Fuzz 5 HB Fuzzy Logic
Guitar Amps 12 Fender Blues Jr, Blackstar
Pre-amp 1 UA 610B
Chorus 1 Brewers Droop
Tremolo 1 Mooer Trelicopter

2. Data Organization and Metadata Schema

Dataset organization mirrors the analog/digital distinction and parametric capability, reflected in the directory structure:

1
2
3
4
5
6
7
8
9
toneTwist-afx-dataset/
  ├── analog/
  │   ├── device_name/
  │   │   ├── dry/       (input .wav files)
  │   │   ├── wet/       (aligned output .wav files)
  │   │   └── metadata.json
  ├── analog_parametric/
  ├── digital/
  └── digital_parametric/

Each device folder contains a metadata.json file, specifying:

  • device_name (string)
  • effect_type (enum: compressor, overdrive, etc.)
  • analog_vs_digital (string)
  • parametric (bool)
  • parameter_values (if applicable: knob/value dict)
  • file_list (array): mapping dry_file, wet_file, source, and split (train/val/test)

Sample abridged metadata.json structure:

1
2
3
4
5
6
7
8
9
10
11
{
  "device_name": "Flamma_AnalogComp",
  "effect_type": "compressor",
  "analog_vs_digital": "analog",
  "parametric": false,
  "file_list": [
    {"dry_file": "dry/guitar1.wav", "wet_file": "wet/guitar1.wav", "source": "electric_guitar_01", "split": "train"},
    {"dry_file": "dry/bass1.wav",   "wet_file": "wet/bass1.wav",  "source": "electric_bass_02",  "split": "val"}
    // ...more entries...
  ]
}

This schema supports automated retrieval, alignment (via impulse markers), and split-specific partitioning for efficient experimentation.

3. Licensing Model and Community Contribution Workflow

ToneTwist AFx is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0), enabling unrestricted use, redistribution, and derivative works with proper attribution.

The community contribution workflow is designed for extensibility and validation:

  1. Contributors fork the repository and add their WAV dry/wet pairs under the appropriate folder.
  2. New or updated metadata.json files must conform to the required schema.
  3. All added data must match the impulse alignment and sampling specifications.
  4. Pull-request submissions trigger automated continuous integration, which validates JSON schema compliance and file naming conventions.

A plausible implication is that, as the community grows, the dataset may expand beyond the initial 40 devices, further supporting open benchmarking and cross-device generalization.

4. Access Methods, Data Loading, and Usage

Primary distribution channels include:

  • GitHub: hosts metadata files, scripts, and contribution guidelines (link)
  • Zenodo: long-term storage for audio files, with separate DOIs per category

Standard access and loading in Python utilizes the librosa library. Example workflow for retrieving device data is:

1
2
3
4
5
6
7
8
9
10
11
import json, librosa
from pathlib import Path

DATA_ROOT = Path("toneTwist-afx-dataset/analog/Flamma_AnalogComp")
meta = json.loads((DATA_ROOT/"metadata.json").read_text())

for entry in meta["file_list"]:
    if entry["split"] != "train": continue
    x, sr  = librosa.load(str(DATA_ROOT/entry["dry_file"]), sr=48000)
    y, _   = librosa.load(str(DATA_ROOT/entry["wet_file"]), sr=48000)
    # normalize, segment, augment, etc.

Recommended splits are 0.9 for train and 0.1 for validation. For non-parametric devices, all sources appear in train and validation, with held-out sources for test. Parametric devices use held-out knob settings (e.g., Gain=5) in the test split.

5. Benchmarked Modeling Approaches and Evaluation Criteria

ToneTwist AFx supports the systematic benchmarking of both black-box and gray-box modeling architectures:

  • Black-box families: LSTM (1 layer, 32 or 96 hidden), TCN (stacked causal convolutions), GCN (WaveNet-style), S4 (structured state-space)
  • Gray-box DSP-inspired: GB-COMP (EQ/dynamic gain/EQ/static gain), GB-DIST, GB-FUZZ (pre/post parametric EQ, offset, memoryless or MLP nonlinearity)

Key objective metrics are provided for quantitative evaluation:

  • L1(x,y)=1Nn=1Nxnyn\mathrm{L1}(x,y) = \frac{1}{N}\sum_{n=1}^N |x_n - y_n|
  • Multi-Resolution STFT loss (MR-STFT): sum of spectral magnitude differences at multiple FFT sizes
  • RMSE=1Nn(xnyn)2\mathrm{RMSE} = \sqrt{\frac{1}{N}\sum_n (x_n - y_n)^2}
  • ESR=n(xnyn)2nxn2\mathrm{ESR} = \frac{\sum_n (x_n - y_n)^2}{\sum_n x_n^2}
  • LSD=1L=1L[20log10X()Y()]2\mathrm{LSD} = \sqrt{\frac{1}{L}\sum_{\ell=1}^{L} \left[ 20\log_{10} \frac{|X(\ell)|}{|Y(\ell)|}\right]^2 }
  • Fréchet Audio Distance (FAD): distance between embedded output distributions (using VGGish, PANN, CLAP, AFx-Rep feature spaces)

Reported results (non-parametric devices):

Model L1+MR-STFT Loss ESR (avg across devices)
S4-TF-L-16 0.43 ± 0.21 ~0.31
TCN-TF-L-16 0.51 ± 0.20 ~0.33
LSTM-96 0.89 ± 0.73 ~0.73
GB-DIST-MLP 1.16 ± 0.41 ~0.42

This suggests that state-space (S4) and TCN architectures yield lower objective losses over the evaluated nonlinear device set, while gray-box DSP approaches lag in ESR and total error under the chosen metrics.

6. Supplementary Resources and Experimental Frameworks

Ancillary tools and resources are integral to the ToneTwist AFx ecosystem:

  • NablAFx: Benchmark codebase for black-box and gray-box training/evaluation (link)
  • Modeling Supplement: Additional scripts and examples (link)
  • Listening Test Framework: webMUSHRA implementation (link)
  • Zenodo Archives: Search “tonetwist-afx” for long-term data storage and downloads

A plausible implication is that the combined open-source code and standardized data organization facilitate direct reproducibility and scalability of model benchmarking, subjective evaluation, and community-driven expansion of both devices and effect categories.

7. Context and Significance in Audio Effects Modeling

ToneTwist AFx directly addresses the historical limitation of prior datasets—namely, the focus on a single effect type or a narrow selection of devices—which hinders generalization and systematic comparison across diverse model architectures. By standardizing input signals, providing dry/wet-aligned blocks, supporting parameterized and non-parametric devices, and enabling objective as well as subjective evaluation, ToneTwist AFx serves as a foundational resource for research in data-driven audio effects modeling. The open CC BY 4.0 license and explicit schema enable both academic and industrial adoption, while the validated contribution protocol encourages the creation of a continuously expanding, high-quality benchmark suitable for future advances in differentiable audio modeling (Comunità et al., 20 Feb 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to ToneTwist AFx Dataset.