Enable segment-level editing and continuation in DiffRhythm
Investigate and implement segment-level editing capabilities in DiffRhythm, including song inpainting and outpainting, by developing training and inference mechanisms such as random masking of variational autoencoder latent representations to enable editing of specific regions within generated compositions.
References
While DiffRhythm demonstrates good capability to generate high-quality full-length songs, two important aspects remain unexplored in our current framework. First, the functionality for editing specific segments within generated compositions has not been investigated.
— DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
(2503.01183 - Ning et al., 3 Mar 2025) in Section: Limitations