- The paper presents a raw audio synthesis method using a modified 2-tier SampleRNN model to generate genre-specific sonic textures.
- The study highlights differences between LSTM and GRU architectures, with LSTMs better capturing the intricate timbral qualities of black metal and math rock.
- The findings extend AI music generation beyond traditional genres, suggesting future exploration of hybrid models that merge symbolic and raw audio techniques.
An Examination of "Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles"
The paper "Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles" introduces a methodology for music generation using raw audio synthesis with a modified SampleRNN to explore genres like black metal and math rock. This investigation distinctly diverges from the prevalent symbolic-domain generation approaches, underscoring the importance of raw audio generation given the distinctive compositional elements of modern music genres.
Methodological Approach
The authors employ a tailored SampleRNN architecture, which operates in the time domain, to generate unique audio sequences. This approach is critical when working with complex audio features, such as timbre and spatial dynamics, that are central to the genres under paper. To this end, the paper utilizes a 2-tier SampleRNN model with varying layers, embedding sizes, and LSTMs or GRUs, operating at a 16kHz sample rate. The training involves a substantial dataset split, processed into 3,200 FLAC audio chunks, leveraging weight normalization to facilitate deep network configurations suitable for the long-sequence generation demands of these intricate music styles.
Results and Observations
The generated compositions demonstrate several genre-specific qualities: for Room For A Ghost, the output retained the band's guitar and vocal timbres while achieving abrupt sectional transitions and irregular time signatures, properties intrinsic to math rock. Similarly, for the black metal band Krallice, the model successfully recreated the atmospheric textures and persistent rhythmic elements. However, GRU models were noted to perform less effectively than LSTMs in capturing the desired sonic characteristics, emphasizing the suitability of LSTM architectures in this neural synthesis context.
The paper also highlights the aesthetic merits of neural synthesis, embracing the artifacts and unintended variances as new artistic dimensions. This perspective aligns with certain musical avant-gardists who incorporate the idiosyncrasies of vintage technology into their work.
Implications and Future Directions
The implications of this research are tangentially twofold: Practically, it extends the utility of neural synthesis in music generation beyond traditional genres, offering new tools for artists exploring novel auditory landscapes. Theoretically, the findings suggest further exploration into neural architectures capable of merging symbolic and raw audio synthesis for richer, more controlled generative capabilities. Future research could investigate hybrid models that leverage symbolic representations alongside raw audio to enhance structural coherence in the generated music.
In summary, this paper provides valuable insights into utilizing neural synthesis for modern music genres, setting the stage for expanded applications in audio generation. Further advancements in combining raw and symbolic approaches could significantly enhance the expressive potential of AI-generated music.