Structure-informed Positional Encoding for Music Generation (2402.13301v2)
Abstract: Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.
- Louis Bigo, Modeling Musical Scores Languages, HDR dissertation, Université de Lille, 2023.
- “Attention Is All You Need,” NeurIPS, 2017.
- “Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions,” ACM International Conference on Multimedia (MM), 2020.
- “PopMAG: Pop Music Accompaniment Generation,” MM, 2020.
- “A Survey on Deep Learning for Symbolic Music Generation: Representations, Algorithms, Evaluations, and Challenges,” ACM Comput. Surv., vol. 56, no. 1, 2023.
- “The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures,” ISMIR, 2020.
- “Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs,” AAAI, 2021.
- “How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection,” NAACL, 2019.
- “Allocating Large Vocabulary Capacity for Cross-Lingual Language Model Pre-Training,” NAACL, 2021.
- “Which transformer architecture fits my data? A vocabulary bottleneck in self-attention,” ICML, 2021.
- “Symphony Generation with Permutation Invariant Language Model,” ISMIR, 2022.
- “A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling,” AAAI, 2023.
- “POP909: A Pop-song Dataset for Music Arrangement Generation,” ISMIR, 2020.
- “Self-Attention with Relative Position Representations,” ACL, 2018.
- “Efficient Transformers: A Survey,” ACM Comput. Surv., 2022.
- “Relative Positional Encoding for Transformers with Linear Complexity,” ICML, 2021.
- “Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers,” 2023, preprint arXiv:2302.01925.
- Marc G Genton, “Classes of Kernels for Machine Learning: A Statistics Perspective,” JMLR, vol. 2, 2001.
- “Curriculum learning,” ICML, 2009.
- “Automatic Analysis and Influence of Hierarchical Structure on Melody, Rhythm and Harmony in Popular Music,” Joint Conference on AI Music Creativity (AIMC), 2020.
- “Music Transformer: Generating Music with Long-Term Structure,” ICLR, 2019.
- “MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just One Transformer VAE,” TASLP, 2022.
- “Real-time drum accompaniment using transformer architecture,” AIMC, 2022.
- “Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel,” EMNLP-IJCNLP, 2019.
- “Transformer Language Models without Positional Encodings Still Learn Positional Information,” EMNLP, 2022.
- “The Impact of Positional Encoding on Length Generalization in Transformers,” 2023, preprint arXiv:2305.19466.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.