ToneTwist AFx Dataset for Nonlinear Audio Modeling
- ToneTwist AFx is an open corpus comprising over 40 unique audio effect devices recorded with standardized protocols for reproducible machine learning research.
- The dataset features both analog and digital effects, offering dry/wet audio pairs in 48 kHz/24-bit WAV format across compressors, overdrive, distortion, and more.
- Its structured metadata schema and community contribution workflow enable consistent benchmarking of black-box, gray-box, and DSP-inspired modeling approaches.
The ToneTwist AFx dataset is a comprehensive open corpus for data-driven modeling of nonlinear audio effects, with a particular focus on differentiable approaches spanning black-box and gray-box paradigms. Introduced in “Differentiable Black-box and Gray-box Modeling of Nonlinear Audio Effects” (Comunită et al., 2025) (Comunità et al., 20 Feb 2025), it encompasses a broad range of analog and digital effect devices and is designed for rigorous, problem-driven machine learning research that targets guitar amplifiers, overdrive, distortion, fuzz, compressors, and related categories. ToneTwist AFx is the first dataset in this domain to support systematic community contributions, with a detailed metadata schema, standardized recording protocols, and a workflow suitable for reproducible training and benchmarking.
1. Dataset Scope and Composition
ToneTwist AFx consists of audio input-output pairs sampled from 40 unique effect devices, encompassing:
- Compressors & Limiter: 5 compressor units (e.g., Ampeg OptoComp, Flamma AnalogComp) and 1 hardware limiter (UA 6176 1176LN)
- Overdrive: 6 devices, including Klon Centaur, TS9, Fulltone FD2
- Distortion: 8 devices (Big Muff variants, HB DropKick, Rodent)
- Fuzz: 5 devices, such as Custom Dynamic Fuzz and HB Fuzzy Logic
- Guitar Amplifiers: 12 models (Blackstar, Fender Blues Jr, Mesa Boogie, etc.)
- Other: Pre-amp (UA 610B), Chorus (Landlord Brewers Droop), Tremolo (Mooer Trelicopter)
Each device is probed with seven dry sources: three electric guitars, two bass guitars, a chirp sweep, and white noise. Device outputs are continuously recorded per source and segmented into 3-second blocks, yielding approximately 10,000 blocks per device and an aggregate duration of ~8 hours. All files are mono 48 kHz/24-bit uncompressed WAV. The exact file-counts and durations are provided in the dataset’s Zenodo repository.
| Effect Category | # Devices | Example Models |
|---|---|---|
| Compressor | 5 | Ampeg OptoComp |
| Limiter | 1 | UA 6176 1176LN |
| Overdrive | 6 | Klon Centaur, TS9 |
| Distortion | 8 | Big Muff, Rodent |
| Fuzz | 5 | HB Fuzzy Logic |
| Guitar Amps | 12 | Fender Blues Jr, Blackstar |
| Pre-amp | 1 | UA 610B |
| Chorus | 1 | Brewers Droop |
| Tremolo | 1 | Mooer Trelicopter |
2. Data Organization and Metadata Schema
Dataset organization mirrors the analog/digital distinction and parametric capability, reflected in the directory structure:
1 2 3 4 5 6 7 8 9 |
toneTwist-afx-dataset/ ├── analog/ │ ├── device_name/ │ │ ├── dry/ (input .wav files) │ │ ├── wet/ (aligned output .wav files) │ │ └── metadata.json ├── analog_parametric/ ├── digital/ └── digital_parametric/ |
Each device folder contains a metadata.json file, specifying:
device_name(string)effect_type(enum: compressor, overdrive, etc.)analog_vs_digital(string)parametric(bool)parameter_values(if applicable: knob/value dict)file_list(array): mappingdry_file,wet_file,source, andsplit(train/val/test)
Sample abridged metadata.json structure:
1 2 3 4 5 6 7 8 9 10 11 |
{
"device_name": "Flamma_AnalogComp",
"effect_type": "compressor",
"analog_vs_digital": "analog",
"parametric": false,
"file_list": [
{"dry_file": "dry/guitar1.wav", "wet_file": "wet/guitar1.wav", "source": "electric_guitar_01", "split": "train"},
{"dry_file": "dry/bass1.wav", "wet_file": "wet/bass1.wav", "source": "electric_bass_02", "split": "val"}
// ...more entries...
]
} |
This schema supports automated retrieval, alignment (via impulse markers), and split-specific partitioning for efficient experimentation.
3. Licensing Model and Community Contribution Workflow
ToneTwist AFx is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0), enabling unrestricted use, redistribution, and derivative works with proper attribution.
The community contribution workflow is designed for extensibility and validation:
- Contributors fork the repository and add their WAV dry/wet pairs under the appropriate folder.
- New or updated
metadata.jsonfiles must conform to the required schema. - All added data must match the impulse alignment and sampling specifications.
- Pull-request submissions trigger automated continuous integration, which validates JSON schema compliance and file naming conventions.
A plausible implication is that, as the community grows, the dataset may expand beyond the initial 40 devices, further supporting open benchmarking and cross-device generalization.
4. Access Methods, Data Loading, and Usage
Primary distribution channels include:
- GitHub: hosts metadata files, scripts, and contribution guidelines (link)
- Zenodo: long-term storage for audio files, with separate DOIs per category
Standard access and loading in Python utilizes the librosa library. Example workflow for retrieving device data is:
1 2 3 4 5 6 7 8 9 10 11 |
import json, librosa from pathlib import Path DATA_ROOT = Path("toneTwist-afx-dataset/analog/Flamma_AnalogComp") meta = json.loads((DATA_ROOT/"metadata.json").read_text()) for entry in meta["file_list"]: if entry["split"] != "train": continue x, sr = librosa.load(str(DATA_ROOT/entry["dry_file"]), sr=48000) y, _ = librosa.load(str(DATA_ROOT/entry["wet_file"]), sr=48000) # normalize, segment, augment, etc. |
Recommended splits are 0.9 for train and 0.1 for validation. For non-parametric devices, all sources appear in train and validation, with held-out sources for test. Parametric devices use held-out knob settings (e.g., Gain=5) in the test split.
5. Benchmarked Modeling Approaches and Evaluation Criteria
ToneTwist AFx supports the systematic benchmarking of both black-box and gray-box modeling architectures:
- Black-box families: LSTM (1 layer, 32 or 96 hidden), TCN (stacked causal convolutions), GCN (WaveNet-style), S4 (structured state-space)
- Gray-box DSP-inspired: GB-COMP (EQ/dynamic gain/EQ/static gain), GB-DIST, GB-FUZZ (pre/post parametric EQ, offset, memoryless or MLP nonlinearity)
Key objective metrics are provided for quantitative evaluation:
- Multi-Resolution STFT loss (MR-STFT): sum of spectral magnitude differences at multiple FFT sizes
- Fréchet Audio Distance (FAD): distance between embedded output distributions (using VGGish, PANN, CLAP, AFx-Rep feature spaces)
Reported results (non-parametric devices):
| Model | L1+MR-STFT Loss | ESR (avg across devices) |
|---|---|---|
| S4-TF-L-16 | 0.43 ± 0.21 | ~0.31 |
| TCN-TF-L-16 | 0.51 ± 0.20 | ~0.33 |
| LSTM-96 | 0.89 ± 0.73 | ~0.73 |
| GB-DIST-MLP | 1.16 ± 0.41 | ~0.42 |
This suggests that state-space (S4) and TCN architectures yield lower objective losses over the evaluated nonlinear device set, while gray-box DSP approaches lag in ESR and total error under the chosen metrics.
6. Supplementary Resources and Experimental Frameworks
Ancillary tools and resources are integral to the ToneTwist AFx ecosystem:
- NablAFx: Benchmark codebase for black-box and gray-box training/evaluation (link)
- Modeling Supplement: Additional scripts and examples (link)
- Listening Test Framework: webMUSHRA implementation (link)
- Zenodo Archives: Search “tonetwist-afx” for long-term data storage and downloads
A plausible implication is that the combined open-source code and standardized data organization facilitate direct reproducibility and scalability of model benchmarking, subjective evaluation, and community-driven expansion of both devices and effect categories.
7. Context and Significance in Audio Effects Modeling
ToneTwist AFx directly addresses the historical limitation of prior datasets—namely, the focus on a single effect type or a narrow selection of devices—which hinders generalization and systematic comparison across diverse model architectures. By standardizing input signals, providing dry/wet-aligned blocks, supporting parameterized and non-parametric devices, and enabling objective as well as subjective evaluation, ToneTwist AFx serves as a foundational resource for research in data-driven audio effects modeling. The open CC BY 4.0 license and explicit schema enable both academic and industrial adoption, while the validated contribution protocol encourages the creation of a continuously expanding, high-quality benchmark suitable for future advances in differentiable audio modeling (Comunità et al., 20 Feb 2025).