Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles (1811.06639v1)

Published 16 Nov 2018 in cs.SD and eess.AS

Abstract: We use a modified SampleRNN architecture to generate music in modern genres such as black metal and math rock. Unlike MIDI and symbolic models, SampleRNN generates raw audio in the time domain. This requirement becomes increasingly important in modern music styles where timbre and space are used compositionally. Long developmental compositions with rapid transitions between sections are possible by increasing the depth of the network beyond the number used for speech datasets. We are delighted by the unique characteristic artifacts of neural synthesis.

Citations (15)

Summary

  • The paper presents a raw audio synthesis method using a modified 2-tier SampleRNN model to generate genre-specific sonic textures.
  • The study highlights differences between LSTM and GRU architectures, with LSTMs better capturing the intricate timbral qualities of black metal and math rock.
  • The findings extend AI music generation beyond traditional genres, suggesting future exploration of hybrid models that merge symbolic and raw audio techniques.

An Examination of "Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles"

The paper "Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles" introduces a methodology for music generation using raw audio synthesis with a modified SampleRNN to explore genres like black metal and math rock. This investigation distinctly diverges from the prevalent symbolic-domain generation approaches, underscoring the importance of raw audio generation given the distinctive compositional elements of modern music genres.

Methodological Approach

The authors employ a tailored SampleRNN architecture, which operates in the time domain, to generate unique audio sequences. This approach is critical when working with complex audio features, such as timbre and spatial dynamics, that are central to the genres under paper. To this end, the paper utilizes a 2-tier SampleRNN model with varying layers, embedding sizes, and LSTMs or GRUs, operating at a 16kHz sample rate. The training involves a substantial dataset split, processed into 3,200 FLAC audio chunks, leveraging weight normalization to facilitate deep network configurations suitable for the long-sequence generation demands of these intricate music styles.

Results and Observations

The generated compositions demonstrate several genre-specific qualities: for Room For A Ghost, the output retained the band's guitar and vocal timbres while achieving abrupt sectional transitions and irregular time signatures, properties intrinsic to math rock. Similarly, for the black metal band Krallice, the model successfully recreated the atmospheric textures and persistent rhythmic elements. However, GRU models were noted to perform less effectively than LSTMs in capturing the desired sonic characteristics, emphasizing the suitability of LSTM architectures in this neural synthesis context.

The paper also highlights the aesthetic merits of neural synthesis, embracing the artifacts and unintended variances as new artistic dimensions. This perspective aligns with certain musical avant-gardists who incorporate the idiosyncrasies of vintage technology into their work.

Implications and Future Directions

The implications of this research are tangentially twofold: Practically, it extends the utility of neural synthesis in music generation beyond traditional genres, offering new tools for artists exploring novel auditory landscapes. Theoretically, the findings suggest further exploration into neural architectures capable of merging symbolic and raw audio synthesis for richer, more controlled generative capabilities. Future research could investigate hybrid models that leverage symbolic representations alongside raw audio to enhance structural coherence in the generated music.

In summary, this paper provides valuable insights into utilizing neural synthesis for modern music genres, setting the stage for expanded applications in audio generation. Further advancements in combining raw and symbolic approaches could significantly enhance the expressive potential of AI-generated music.

Youtube Logo Streamline Icon: https://streamlinehq.com