Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 218 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding (2312.08931v2)

Published 13 Dec 2023 in cs.SD, cs.AI, cs.MM, and eess.AS

Abstract: The first step to apply deep learning techniques for symbolic music understanding is to transform musical pieces (mainly in MIDI format) into sequences of predefined tokens like note pitch, note velocity, and chords. Subsequently, the sequences are fed into a neural sequence model to accomplish specific tasks. Music sequences exhibit strong correlations between adjacent elements, making them prime candidates for N-gram techniques from NLP. Consider classical piano music: specific melodies might recur throughout a piece, with subtle variations each time. In this paper, we propose a novel method, NG-Midiformer, for understanding symbolic music sequences that leverages the N-gram approach. Our method involves first processing music pieces into word-like sequences with our proposed unsupervised compoundation, followed by using our N-gram Transformer encoder, which can effectively incorporate N-gram information to enhance the primary encoder part for better understanding of music sequences. The pre-training process on large-scale music datasets enables the model to thoroughly learn the N-gram information contained within music sequences, and subsequently apply this information for making inferences during the fine-tuning stage. Experiment on various datasets demonstrate the effectiveness of our method and achieved state-of-the-art performance on a series of music understanding downstream tasks. The code and model weights will be released at https://github.com/CinqueOrigin/NG-Midiformer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Allwright, J. 2003. ABC version of the Nottingham Music Database. https://abc.sourceforge.net/NMD/index.html.
  2. Deep Learning Techniques for Music Generation - A Survey.
  3. Class-Based n-gram Models of Natural Language. Computational Linguistics, 18(4): 467–480.
  4. Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE, 96(4): 668–696.
  5. MidiBERT-piano: Large-scale pre-training for symbolic music understanding. arXiv preprint arXiv:2107.05223.
  6. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. 2978–2988.
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
  8. ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations. In Findings of the Association for Computational Linguistics: EMNLP 2020, 4729–4740. Online.
  9. Computer-generated music for tabletop role-playing games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, 59–65.
  10. Computer-Generated Music for Tabletop Role-Playing Games.
  11. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30: 681–694.
  12. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset. In International Conference on Learning Representations.
  13. Melody2vec: Distributed representations of melodic phrases based on melody segmentation. Journal of Information Processing, 27: 278–286.
  14. Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 178–186.
  15. Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia, 1180–1188.
  16. EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. In Proc. Int. Society for Music Information Retrieval Conf.
  17. Jackendoff, R. 2009. Parallels and nonparallels between language and music. Music perception, 26(3): 195–204.
  18. joann8512. 2021. joann8512/Pianist8: First release of Pianist8.
  19. GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music. Trans. Int. Soc. Music. Inf. Retr., 5: 87–98.
  20. High-resolution piano transcription with pedals by regressing onset and offset times. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29: 3707–3717.
  21. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 66–71. Brussels, Belgium: Association for Computational Linguistics.
  22. Toward intelligent music information retrieval. IEEE Transactions on Multimedia, 8(3): 564–574.
  23. Fine-Grained Position Helps Memorizing More, A Novel Music Compound Transformer Model with Feature Interaction Fusion.
  24. Text compression-aided transformer encoding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7): 3840–3857.
  25. An Optimized Method for Large-Scale Pre-Training in Symbolic Music. In 2022 IEEE 16th International Conference on Anti-counterfeiting, Security, and Identification (ASID), 105–109. IEEE.
  26. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, abs/1907.11692.
  27. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations.
  28. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
  29. Deep learning for audio-based music classification and tagging: Teaching computers to distinguish rock from bach. IEEE signal processing magazine, 36(1): 41–51.
  30. This time with feeling: Learning expressive musical performance. Neural Computing and Applications, 32: 955–967.
  31. Patel, A. D. 2003. Language, music, syntax and the brain. Nature neuroscience, 6(7): 674–681.
  32. Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507.
  33. Continuous N-gram Representations for Authorship Attribution. In Lapata, M.; Blunsom, P.; and Koller, A., eds., Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 267–273. Valencia, Spain: Association for Computational Linguistics.
  34. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725. Berlin, Germany: Association for Computational Linguistics.
  35. Embedded malware detection using markov n-grams. In International conference on detection of intrusions and malware, and vulnerability assessment, 88–107. Springer.
  36. Sturm, B. L. 2013. The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use. arXiv preprint arXiv:1306.1461.
  37. Attention is all you need. Advances in neural information processing systems, 30.
  38. POP909: A Pop-song Dataset for Music Arrangement Generation. In Proceedings of 21st International Conference on Music Information Retrieval, ISMIR.
  39. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
  40. MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 791–800. Online: Association for Computational Linguistics.
  41. Semantics-aware BERT for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 9628–9635.
  42. Effective subword segmentation for text comprehension. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(11): 1664–1674.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.