Music Representing Corpus Virtual (MRCV)
- Music Representing Corpus Virtual (MRCV) is an open-source framework that integrates neural network training with modular pipelines for music generation, sound design, and instrument creation.
- It offers a flexible API for data ingestion from MIDI and audio, allowing customizable preprocessing, feature extraction, and rapid prototyping using various signal transformation techniques.
- The system supports multiple neural architectures for tasks such as MIDI generation and real-time audio effects, validated by quantitative metrics and user studies for performance and creativity.
Music Representing Corpus Virtual (MRCV) is an open-source software suite designed to facilitate explorative music generation, sound design, and virtual instrument creation. It provides a modular, user-configurable environment for training and deploying neural networks on symbolic (MIDI) and audio datasets, focusing on rapid prototyping and flexibility for both seasoned researchers and practitioners in the field of AI-driven music technology (Clarke, 2023).
1. System Architecture and Workflow
MRCV’s architecture is organized as a sequence of five loosely coupled modules, each accessible via a unified Python “Creator” API and a top-level command-line interface (CLI):
- Data Ingestion: MIDI files are parsed using the MAESTRO reader or custom composer-based selection (e.g.,
get_data_for_composer). Audio files are ingested as 44.1 kHz PCM WAVs. - Preprocessing: MIDI data is packed into model-ready tensors; audio undergoes block windowing (configurable block and hop sizes) and optional spectral feature extraction, such as STFT or MFCC. The feature-extraction pipeline can be formalized as , where denotes signal transformation and encapsulates further mapping or reduction.
- Neural Network Training: Model definitions are instantiated by Creator methods (e.g.,
createDenseModelForNeuralNet2). Training uses default Keras/TensorFlow routines with customizable hyperparameters exposed via JSON/YAML. - Inference & Sound Synthesis: Trained models generate symbolic sequences (MIDI) or audio (WAV). Models can be run in "generate" mode for new output creation.
- Instrument Builder: Synthesizes custom software instruments by organizing generated audio into sampler zones and exporting instrument definitions (e.g., Decent Sampler XML).
Each module is engineered to permit file- or memory-based data flow, enabling users to override subprocesses, insert custom feature engineering, or bypass given stages as needed (Clarke, 2023).
2. Neural Network Models Implemented
MRCV implements four principal neural network architectures, each serving distinct purposes in the MGSDIC (Music Generation, Sound Design, and Instrument Creation) workflow:
| Network ID | Architecture Type | Primary Application |
|---|---|---|
| NN-1 | Dense, multi-head FC | Symbolic music sequence (MIDI) generation |
| NN-2 | Deep Dense + Dropout | Explorative audio block prediction |
| NN-3 | Stacked GRU | Real-time audio-to-audio effects, plugin |
| NN-4 | Feedforward MLP | Wavetable synthesis from audio features |
- NN-1 (MixMo/MIMO): Predicts next MIDI note events (onset, duration, pitch, velocity), using separate loss heads for joint modelling. Input is , and loss per head is MSE: .
- NN-2: Models audio block prediction using deep dense layers with dropout, suited for timbral morphing or explorative synthesis. Input is a block of raw waveform samples.
- NN-3: GRU-based, low-latency architecture for samplewise prediction, enabling real-time plugin export (VST/AU). Encodes short sequence memory ( steps).
- NN-4: Feed-forward MLP for synthesizing wavetables. Trained on MFCC or STFT representations; optimizes spectral envelope via or .
The design does not employ adversarial or RL-based objectives by default but is extensible to include and regularizations such as those in VAEs (Clarke, 2023).
3. Dataset Handling, Preprocessing, and Customization
MRCV’s data layer supports both standard MIDI and PCM/WAV files (44.1 kHz, mono/stereo). Feature extraction is user-configurable: FFT length, MFCC order, Mel band count, and hop sizes can be specified. Dataset ingestion is performed via direct path referencing or by composer filtering, as in:
1 2 |
midi_data = get_data_for_composer(main_data, ["Bach", "Debussy"]) audio_files = get_audio_dataset("/path/to/my_samples/*.wav") |
Augmentation capabilities include corpus concatenation, time-stretch, pitch-shift, and frame-level dropout. Data pipelines are defined in YAML and may be customized at any stage with Python callbacks (Clarke, 2023).
4. Sound Synthesis, Instrument Creation, and Algorithms
Sound synthesis proceeds via model-specific pathways. NN-2 outputs are concatenated into WAV blocks and mapped to multi-sample instrument zones, while NN-3 can be deployed as a real-time audio plugin. NN-4 enables generation of single-cycle wavetables for oscillator-based synthesis.
Instrument definition is automated: generated waves are organized into round-robin or velocity zones (for sampled instruments), and exported via Decent Sampler XML with ADSR (Attack, Decay, Sustain, Release) parametrization. Users may manually edit descriptors for fine-grained control of loop points, velocity cross-fades, or mapping (Clarke, 2023).
Latent code interpolation and corpus blending allow creation of hybrid timbres, enabling traversal through spaces between corpora (e.g., “morphing” between flute and saxophone timbres).
5. Evaluation Metrics, Examples, and Empirical Results
Evaluation is primarily quantitative:
- Convergence curves: MSE () reported for all models; NN-1 achieves piano-roll MSE after 50 epochs on the MAESTRO MIDI/audio dataset, while NN-2 achieves audio-block MSE on saxophone multiphonics.
- Audio metrics: Signal-to-Noise Ratio (SNR) dB for NN-2 on held-out sets; log-spectral distortion is evaluated offline.
- Human evaluation: A preliminary user study () yields mean ratings for ease of instrument creation ($4.3/5$) and timbral novelty ($4.1/5$).
Representative outputs include Handel-style MIDI sequence generation, hybrid electronic percussion timbres, real-time neural audio effects, and custom neural wavetables imported into performance samplers (Clarke, 2023).
6. Extensibility, API Design, and Community Model
Extension is achieved via a plugin/API architecture: new model types are registered by subclassing a BaseCreator with explicit network configuration and definition. Contribution guidelines utilize standard open-source workflows (fork/PR model, continuous integration, automatic code-style checks). Documentation expects contributions in Markdown.
The suite is MIT-licensed. Semantic versioning and comprehensive test automation facilitate robust, ongoing development (Clarke, 2023).
7. Related Approaches and Theoretical Context
MRCV’s focus on neural music representation, flexible data pipelines, and instrument builder scripts situates it alongside research frameworks targeting data-driven music analysis and compositionality. In comparison, approaches emphasizing learned discrete or hierarchical codes, such as Vector Quantized Contrastive Predictive Coding (VQ-CPC) (Hadjeres et al., 2020) and graph-based centroid corpus summarization (Shapiro et al., 21 Feb 2025), provide complementary perspectives on corpus abstraction.
This suggests that MRCV can serve both as a production platform and as a laboratory for testing higher-level music representation schemes, provided the relevant architectures are implemented in accordance with the Creator API formalism. While MRCV does not inherently provide structural or hierarchical abstractions analogous to STGs or VQ clusters, its pipeline can in principle ingest such features as input or output, contingent on user-side customization.