LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search (2404.14063v1)
Abstract: Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.
- DeepMasterPrints: Generating MasterPrints for Dictionary Attacks via Latent Variable Evolution. In IEEE 9th Intl. Conf. on Biometrics Theory, Applications and Systems (BTAS). 1–9. https://doi.org/10.1109/BTAS.2018.8698539 ISSN: 2474-9699.
- Antoine Caillon and Philippe Esling. 2021. RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. https://doi.org/10.48550/arXiv.2111.05011 arXiv:2111.05011 [cs, eess].
- Palle Dahlstedt. 2007. Evolution in Creative Sound Design. In Evolutionary Computer Music, Eduardo Reck Miranda and John Al Biles (Eds.). Springer London, London, 79–99. https://doi.org/10.1007/978-1-84628-600-1_4
- VampNet: Music Generation via Masked Acoustic Token Modeling. http://arxiv.org/abs/2307.04686 arXiv:2307.04686 [cs, eess].
- Audio Set: An ontology and human-labeled dataset for audio events. In Proc. IEEE ICASSP 2017. New Orleans, LA, 776–780. https://doi.org/10.1109/ICASSP.2017.7952261
- CNN Architectures for Large-Scale Audio Classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://arxiv.org/abs/1609.09430
- Towards Sound Innovation Engines Using Pattern-Producing Networks and Audio Graphs. In Artificial Intelligence in Music, Sound, Art and Design, Colin Johnson, Sérgio M. Rebelo, and Iria Santos (Eds.). Springer Nature Switzerland, Cham, 211–227.
- Interactively Evolving Compositional Sound Synthesis Networks. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO ’15). Association for Computing Machinery, New York, NY, USA, 321–328. https://doi.org/10.1145/2739480.2754796
- Joel Lehman and Kenneth O. Stanley. 2011a. Abandoning Objectives: Evolution Through the Search for Novelty Alone. Evolutionary Computation 19, 2 (June 2011), 189–223. https://doi.org/10.1162/EVCO_a_00025
- Joel Lehman and Kenneth O. Stanley. 2011b. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation (GECCO ’11). Association for Computing Machinery, New York, NY, USA, 211–218. https://doi.org/10.1145/2001576.2001606
- Evolutionary Computation Applied to Sound Synthesis. In The Art of Artificial Evolution: A Handbook on Evolutionary Art and Music, Juan Romero and Penousal Machado (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 81–101. https://doi.org/10.1007/978-3-540-72877-1_4
- WaveNet: A Generative Model for Raw Audio. https://doi.org/10.48550/arXiv.1609.03499 arXiv:1609.03499 [cs].
- Anurag Sarkar and Seth Cooper. 2021. Generating and Blending Game Levels via Quality-Diversity in the Latent Space of a Variational Autoencoder. In Proceedings of the 16th International Conference on the Foundations of Digital Games (FDG ’21). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3472538.3472545
- Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion. https://doi.org/10.48550/arXiv.2301.11757 arXiv:2301.11757 [cs, eess].
- Level generation for angry birds with sequential VAE and latent variable evolution. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’21). Association for Computing Machinery, New York, NY, USA, 1052–1060. https://doi.org/10.1145/3449639.3459290
- Jinyue Guo (2 papers)
- Anna-Maria Christodoulou (1 paper)
- Balint Laczko (1 paper)
- Kyrre Glette (34 papers)