HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks (2401.04558v1)
Abstract: GANStrument, exploiting GANs with a pitch-invariant feature extractor and instance conditioning technique, has shown remarkable capabilities in synthesizing realistic instrument sounds. To further improve the reconstruction ability and pitch accuracy to enhance the editability of user-provided sound, we propose HyperGANStrument, which introduces a pitch-invariant hypernetwork to modulate the weights of a pre-trained GANStrument generator, given a one-shot sound as input. The hypernetwork modulation provides feedback for the generator in the reconstruction of the input sound. In addition, we take advantage of an adversarial fine-tuning scheme for the hypernetwork to improve the reconstruction fidelity and generation diversity of the generator. Experimental results show that the proposed model not only enhances the generation capability of GANStrument but also significantly improves the editability of synthesized sounds. Audio examples are available at the online demo page.
- “GANStrument: Adversarial Instrument Sound Synthesis with Pitch-Invariant Instance Conditioning,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2023, pp. 1–5.
- “Differentiable Wavetable Synthesis,” Feb. 2022.
- “DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks,” June 2022.
- “DDSP: Differentiable Digital Signal Processing,” Jan. 2020.
- “Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders,” June 2019.
- “GANSynth: Adversarial Neural Audio Synthesis,” in International Conference on Learning Representations, Sept. 2018.
- “Neural audio synthesis of musical notes with WaveNet autoencoders,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, Aug. 2017, pp. 1068–1077.
- “Analyzing and Improving the Image Quality of StyleGAN,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020, pp. 8107–8116.
- “HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18511–18521.
- “HyperNetworks,” Dec. 2016.
- “WaveNet: A Generative Model for Raw Audio,” Sept. 2016.
- “RAVE: A variational autoencoder for fast and high-quality neural audio synthesis,” Dec. 2021.
- “Transplayer: Timbre Style Transfer with Flexible Timbre Control,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2023, pp. 1–5.
- “Generative Visual Manipulation on the Natural Image Manifold,” in Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, Eds., 2016, pp. 597–613.
- “ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021, pp. 6691–6700.
- “Designing an Encoder for StyleGAN Image Manipulation,” Feb. 2021.
- “High-Fidelity GAN Inversion for Image Attribute Editing,” Mar. 2022.
- “HyperInverter: Improving StyleGAN Inversion via Hypernetwork,” Apr. 2022.
- “Instance-Conditioned GAN,” in Advances in Neural Information Processing Systems, Nov. 2021.
- “cGANs with Projection Discriminator,” in International Conference on Learning Representations, Feb. 2018.
- “Adam: A Method for Stochastic Optimization,” Jan. 2017.
- “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” Jan. 2018.
- “Deep Residual Learning for Image Recognition,” Dec. 2015.
- Zhe Zhang (182 papers)
- Taketo Akama (13 papers)