Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ByteComposer: a Human-like Melody Composition Method based on Language Model Agent (2402.17785v2)

Published 24 Feb 2024 in cs.SD, cs.AI, and eess.AS

Abstract: LLMs (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Evaluation and Modification - Aesthetic Selection". This framework seamlessly blends the interactive and knowledge-understanding features of LLMs with existing symbolic music generation models, thereby achieving a melody composition agent comparable to human creators. We conduct extensive experiments on GPT4 and several open-source LLMs, which substantiate our framework's effectiveness. Furthermore, professional music composers were engaged in multi-dimensional evaluations, the final results demonstrated that across various facets of music composition, ByteComposer agent attains the level of a novice melody composer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Musiclm: Generating music from text. arXiv preprint arXiv:2301.11325.
  2. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
  3. Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376.
  4. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  5. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
  6. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  7. Language models are general-purpose interfaces. arXiv preprint arXiv:2206.06336.
  8. Mulan: A joint embedding of music audio and natural language. In Ismir 2022 Hybrid Conference.
  9. Noise2music: Text-conditioned music generation with diffusion models. arXiv preprint arXiv:2302.03917.
  10. Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. arXiv preprint arXiv:2301.12661.
  11. Efficient neural music generation. arXiv preprint arXiv:2305.15719.
  12. Jen-1: Text-guided universal music generation with omnidirectional diffusion models. arXiv preprint arXiv:2308.04729.
  13. Musecoco: Generating symbolic music from text. arXiv preprint arXiv:2306.00110.
  14. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.
  15. Augmented language models: a survey. arXiv preprint arXiv:2302.07842.
  16. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  17. Refiner: Reasoning feedback on intermediate representations. arXiv preprint arXiv:2304.01904.
  18. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  19. Mo\\\backslash\^ usai: Text-to-music generation with long-context latent diffusion. arXiv preprint arXiv:2301.11757.
  20. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
  21. Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366, 14.
  22. Figaro: Controllable music generation using learned and expert features. In The Eleventh International Conference on Learning Representations.
  23. Self-consistency improves chain of thought reasoning in language models.
  24. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  25. Shangda Wu and Maosong Sun. 2023. Tunesformer: Forming tunes with control codes. arXiv preprint arXiv:2301.02884.
  26. Decomposition enhances reasoning via self-evaluation guided decoding. arXiv preprint arXiv:2305.00633.
  27. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  28. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
  29. Butter: A representation learning framework for bi-directional music-sentence retrieval and generation. NLP4MusA 2020, page 54.
  30. Least-to-most prompting enables complex reasoning in large language models.
  31. Ernie-music: Text-to-waveform music generation with diffusion models. arXiv preprint arXiv:2302.04456.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com