Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search (2404.14063v1)

Published 22 Apr 2024 in cs.SD, cs.LG, cs.NE, and eess.AS

Abstract: Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. DeepMasterPrints: Generating MasterPrints for Dictionary Attacks via Latent Variable Evolution. In IEEE 9th Intl. Conf. on Biometrics Theory, Applications and Systems (BTAS). 1–9. https://doi.org/10.1109/BTAS.2018.8698539 ISSN: 2474-9699.
  2. Antoine Caillon and Philippe Esling. 2021. RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. https://doi.org/10.48550/arXiv.2111.05011 arXiv:2111.05011 [cs, eess].
  3. Palle Dahlstedt. 2007. Evolution in Creative Sound Design. In Evolutionary Computer Music, Eduardo Reck Miranda and John Al Biles (Eds.). Springer London, London, 79–99. https://doi.org/10.1007/978-1-84628-600-1_4
  4. VampNet: Music Generation via Masked Acoustic Token Modeling. http://arxiv.org/abs/2307.04686 arXiv:2307.04686 [cs, eess].
  5. Audio Set: An ontology and human-labeled dataset for audio events. In Proc. IEEE ICASSP 2017. New Orleans, LA, 776–780. https://doi.org/10.1109/ICASSP.2017.7952261
  6. CNN Architectures for Large-Scale Audio Classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://arxiv.org/abs/1609.09430
  7. Towards Sound Innovation Engines Using Pattern-Producing Networks and Audio Graphs. In Artificial Intelligence in Music, Sound, Art and Design, Colin Johnson, Sérgio M. Rebelo, and Iria Santos (Eds.). Springer Nature Switzerland, Cham, 211–227.
  8. Interactively Evolving Compositional Sound Synthesis Networks. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO ’15). Association for Computing Machinery, New York, NY, USA, 321–328. https://doi.org/10.1145/2739480.2754796
  9. Joel Lehman and Kenneth O. Stanley. 2011a. Abandoning Objectives: Evolution Through the Search for Novelty Alone. Evolutionary Computation 19, 2 (June 2011), 189–223. https://doi.org/10.1162/EVCO_a_00025
  10. Joel Lehman and Kenneth O. Stanley. 2011b. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation (GECCO ’11). Association for Computing Machinery, New York, NY, USA, 211–218. https://doi.org/10.1145/2001576.2001606
  11. Evolutionary Computation Applied to Sound Synthesis. In The Art of Artificial Evolution: A Handbook on Evolutionary Art and Music, Juan Romero and Penousal Machado (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 81–101. https://doi.org/10.1007/978-3-540-72877-1_4
  12. WaveNet: A Generative Model for Raw Audio. https://doi.org/10.48550/arXiv.1609.03499 arXiv:1609.03499 [cs].
  13. Anurag Sarkar and Seth Cooper. 2021. Generating and Blending Game Levels via Quality-Diversity in the Latent Space of a Variational Autoencoder. In Proceedings of the 16th International Conference on the Foundations of Digital Games (FDG ’21). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3472538.3472545
  14. Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion. https://doi.org/10.48550/arXiv.2301.11757 arXiv:2301.11757 [cs, eess].
  15. Level generation for angry birds with sequential VAE and latent variable evolution. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’21). Association for Computing Machinery, New York, NY, USA, 1052–1060. https://doi.org/10.1145/3449639.3459290
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jinyue Guo (2 papers)
  2. Anna-Maria Christodoulou (1 paper)
  3. Balint Laczko (1 paper)
  4. Kyrre Glette (34 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets