Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Subtractive Training for Music Stem Insertion using Latent Diffusion Models (2406.19328v1)

Published 27 Jun 2024 in cs.SD, cs.LG, and eess.AS

Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusion model to generate the missing instrument stem, guided by both the existing stems and the text instruction. Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks. We also show that we can use the text instruction to control the generation of the inserted stem in terms of rhythm, dynamics, and genre, allowing us to modify the style of a single instrument in a full song while keeping the remaining instruments the same. Lastly, we extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ivan Villa-Renteria (3 papers)
  2. Mason L. Wang (1 paper)
  3. Zachary Shah (3 papers)
  4. Zhe Li (211 papers)
  5. Soohyun Kim (10 papers)
  6. Neelesh Ramachandran (2 papers)
  7. Mert Pilanci (102 papers)

Summary

We haven't generated a summary for this paper yet.