Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
88 tokens/sec
Gemini 2.5 Pro Premium
39 tokens/sec
GPT-5 Medium
27 tokens/sec
GPT-5 High Premium
22 tokens/sec
GPT-4o
88 tokens/sec
DeepSeek R1 via Azure Premium
95 tokens/sec
GPT OSS 120B via Groq Premium
465 tokens/sec
Kimi K2 via Groq Premium
226 tokens/sec
2000 character limit reached

Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement (2410.16785v2)

Published 22 Oct 2024 in cs.SD, eess.AS, and cs.LG

Abstract: Recent MIDI-to-audio synthesis methods using deep neural networks have successfully generated high-quality, expressive instrumental tracks. However, these methods require MIDI annotations for supervised training, limiting the diversity of instrument timbres and expression styles in the output. We propose CoSaRef, a MIDI-to-audio synthesis method that does not require MIDI-audio paired datasets. CoSaRef first generates a synthetic audio track using concatenative synthesis based on MIDI input, then refines it with a diffusion-based deep generative model trained on datasets without MIDI annotations. This approach improves the diversity of timbres and expression styles. Additionally, it allows detailed control over timbres and expression through audio sample selection and extra MIDI design, similar to traditional functions in digital audio workstations. Experiments showed that CoSaRef could generate realistic tracks while preserving fine-grained timbre control via one-shot samples. Moreover, despite not being supervised on MIDI annotation, CoSaRef outperformed the state-of-the-art timbre-controllable method based on MIDI supervision in both objective and subjective evaluation.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to a collection.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)