Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Performer: Score-to-Audio Music Performance Synthesis (2202.06034v2)

Published 12 Feb 2022 in cs.SD, cs.LG, cs.MM, eess.AS, and eess.SP

Abstract: Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing a fine-grained conditioning in a transformer encoder-decoder model. To train our proposed system, we present a new violin dataset consisting of paired recordings and scores along with estimated alignments between them. We show that our proposed model can synthesize music with clear polyphony and harmonic structures. In a listening test, we achieve competitive quality against the baseline model, a conditional generative audio model, in terms of pitch accuracy, timbre and noise level. Moreover, our proposed model significantly outperforms the baseline on an existing piano dataset in overall quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hao-Wen Dong (31 papers)
  2. Cong Zhou (39 papers)
  3. Taylor Berg-Kirkpatrick (106 papers)
  4. Julian McAuley (238 papers)
Citations (14)
Youtube Logo Streamline Icon: https://streamlinehq.com