Papers
Topics
Authors
Recent
Search
2000 character limit reached

Music Representing Corpus Virtual (MRCV)

Updated 6 February 2026
  • Music Representing Corpus Virtual (MRCV) is an open-source framework that integrates neural network training with modular pipelines for music generation, sound design, and instrument creation.
  • It offers a flexible API for data ingestion from MIDI and audio, allowing customizable preprocessing, feature extraction, and rapid prototyping using various signal transformation techniques.
  • The system supports multiple neural architectures for tasks such as MIDI generation and real-time audio effects, validated by quantitative metrics and user studies for performance and creativity.

Music Representing Corpus Virtual (MRCV) is an open-source software suite designed to facilitate explorative music generation, sound design, and virtual instrument creation. It provides a modular, user-configurable environment for training and deploying neural networks on symbolic (MIDI) and audio datasets, focusing on rapid prototyping and flexibility for both seasoned researchers and practitioners in the field of AI-driven music technology (Clarke, 2023).

1. System Architecture and Workflow

MRCV’s architecture is organized as a sequence of five loosely coupled modules, each accessible via a unified Python “Creator” API and a top-level command-line interface (CLI):

  • Data Ingestion: MIDI files are parsed using the MAESTRO reader or custom composer-based selection (e.g., get_data_for_composer). Audio files are ingested as 44.1 kHz PCM WAVs.
  • Preprocessing: MIDI data is packed into model-ready tensors; audio undergoes block windowing (configurable block and hop sizes) and optional spectral feature extraction, such as STFT or MFCC. The feature-extraction pipeline can be formalized as y^=M(F(x))\hat{y} = \mathbb{M}\bigl(\mathcal{F}(x)\bigr), where F\mathcal{F} denotes signal transformation and M\mathbb{M} encapsulates further mapping or reduction.
  • Neural Network Training: Model definitions are instantiated by Creator methods (e.g., createDenseModelForNeuralNet2). Training uses default Keras/TensorFlow routines with customizable hyperparameters exposed via JSON/YAML.
  • Inference & Sound Synthesis: Trained models generate symbolic sequences (MIDI) or audio (WAV). Models can be run in "generate" mode for new output creation.
  • Instrument Builder: Synthesizes custom software instruments by organizing generated audio into sampler zones and exporting instrument definitions (e.g., Decent Sampler XML).

Each module is engineered to permit file- or memory-based data flow, enabling users to override subprocesses, insert custom feature engineering, or bypass given stages as needed (Clarke, 2023).

2. Neural Network Models Implemented

MRCV implements four principal neural network architectures, each serving distinct purposes in the MGSDIC (Music Generation, Sound Design, and Instrument Creation) workflow:

Network ID Architecture Type Primary Application
NN-1 Dense, multi-head FC Symbolic music sequence (MIDI) generation
NN-2 Deep Dense + Dropout Explorative audio block prediction
NN-3 Stacked GRU Real-time audio-to-audio effects, plugin
NN-4 Feedforward MLP Wavetable synthesis from audio features
  • NN-1 (MixMo/MIMO): Predicts next MIDI note events (onset, duration, pitch, velocity), using separate loss heads for joint modelling. Input is xt={onsett,durationt,pitcht,velocityt}x_t = \{onset_t, duration_t, pitch_t, velocity_t\}, and loss per head is MSE: LMSE=1ni=1n(yiy^i)2\mathcal{L}_{\mathrm{MSE}} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2.
  • NN-2: Models audio block prediction using deep dense layers with dropout, suited for timbral morphing or explorative synthesis. Input is a block of raw waveform samples.
  • NN-3: GRU-based, low-latency architecture for samplewise prediction, enabling real-time plugin export (VST/AU). Encodes short sequence memory (NN steps).
  • NN-4: Feed-forward MLP for synthesizing wavetables. Trained on MFCC or STFT representations; optimizes spectral envelope via Lwavetable=STFT(y^)STFT(x)22\mathcal{L}_{wavetable} = \|\mathrm{STFT}(\hat{y}) - \mathrm{STFT}(x)\|_2^2 or MFCC(y^)MFCC(x)22\|\mathrm{MFCC}(\hat{y}) - \mathrm{MFCC}(x)\|_2^2.

The design does not employ adversarial or RL-based objectives by default but is extensible to include Ladv\mathcal{L}_{adv} and regularizations such as those in VAEs (Clarke, 2023).

3. Dataset Handling, Preprocessing, and Customization

MRCV’s data layer supports both standard MIDI and PCM/WAV files (44.1 kHz, mono/stereo). Feature extraction is user-configurable: FFT length, MFCC order, Mel band count, and hop sizes can be specified. Dataset ingestion is performed via direct path referencing or by composer filtering, as in:

1
2
midi_data = get_data_for_composer(main_data, ["Bach", "Debussy"])
audio_files = get_audio_dataset("/path/to/my_samples/*.wav")

Augmentation capabilities include corpus concatenation, time-stretch, pitch-shift, and frame-level dropout. Data pipelines are defined in YAML and may be customized at any stage with Python callbacks (Clarke, 2023).

4. Sound Synthesis, Instrument Creation, and Algorithms

Sound synthesis proceeds via model-specific pathways. NN-2 outputs are concatenated into WAV blocks and mapped to multi-sample instrument zones, while NN-3 can be deployed as a real-time audio plugin. NN-4 enables generation of single-cycle wavetables for oscillator-based synthesis.

Instrument definition is automated: generated waves are organized into round-robin or velocity zones (for sampled instruments), and exported via Decent Sampler XML with ADSR (Attack, Decay, Sustain, Release) parametrization. Users may manually edit descriptors for fine-grained control of loop points, velocity cross-fades, or mapping (Clarke, 2023).

Latent code interpolation and corpus blending allow creation of hybrid timbres, enabling traversal through spaces between corpora (e.g., “morphing” between flute and saxophone timbres).

5. Evaluation Metrics, Examples, and Empirical Results

Evaluation is primarily quantitative:

  • Convergence curves: MSE (LMSE\mathcal{L}_{\mathrm{MSE}}) reported for all models; NN-1 achieves piano-roll MSE <0.01<0.01 after 50 epochs on the MAESTRO MIDI/audio dataset, while NN-2 achieves audio-block MSE 1×104\approx 1 \times 10^{-4} on saxophone multiphonics.
  • Audio metrics: Signal-to-Noise Ratio (SNR) >20>20 dB for NN-2 on held-out sets; log-spectral distortion is evaluated offline.
  • Human evaluation: A preliminary user study (n=12n=12) yields mean ratings for ease of instrument creation ($4.3/5$) and timbral novelty ($4.1/5$).

Representative outputs include Handel-style MIDI sequence generation, hybrid electronic percussion timbres, real-time neural audio effects, and custom neural wavetables imported into performance samplers (Clarke, 2023).

6. Extensibility, API Design, and Community Model

Extension is achieved via a plugin/API architecture: new model types are registered by subclassing a BaseCreator with explicit network configuration and definition. Contribution guidelines utilize standard open-source workflows (fork/PR model, continuous integration, automatic code-style checks). Documentation expects contributions in Markdown.

The suite is MIT-licensed. Semantic versioning and comprehensive test automation facilitate robust, ongoing development (Clarke, 2023).

MRCV’s focus on neural music representation, flexible data pipelines, and instrument builder scripts situates it alongside research frameworks targeting data-driven music analysis and compositionality. In comparison, approaches emphasizing learned discrete or hierarchical codes, such as Vector Quantized Contrastive Predictive Coding (VQ-CPC) (Hadjeres et al., 2020) and graph-based centroid corpus summarization (Shapiro et al., 21 Feb 2025), provide complementary perspectives on corpus abstraction.

This suggests that MRCV can serve both as a production platform and as a laboratory for testing higher-level music representation schemes, provided the relevant architectures are implemented in accordance with the Creator API formalism. While MRCV does not inherently provide structural or hierarchical abstractions analogous to STGs or VQ clusters, its pipeline can in principle ingest such features as input or output, contingent on user-side customization.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Music Representing Corpus Virtual (MRCV).