Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing (2310.12404v2)
Abstract: Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a LLM to interpret user intentions and select appropriate AI models for task execution. Each backend model is specialized for a specific task, and their outputs are aggregated to meet the user's requirements. To ensure musical coherence, essential attributes are maintained in a centralized table. We evaluate the effectiveness of the proposed system through semi-structured interviews and questionnaires, highlighting its utility not only in facilitating music creation but also its potential for broader applications.
- [n. d.]. PG Music Inc. https://www.pgmusic.com/
- MusicLM: Generating music from text. arXiv preprint arXiv:2301.11325 (2023).
- Calliope: A Co-creative Interface for Multi-Track Music Generation. In Proceedings of the 14th Conference on Creativity and Cognition. 608–611.
- John Brooke. 1996. Sus: a “quick and dirty’usability. Usability evaluation in industry 189, 3 (1996), 189–194.
- CoCon: A Self-Supervised Approach for Controlled Text Generation. In International Conference on Learning Representations.
- MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies. arXiv preprint arXiv:2308.01546 (2023).
- Groove2groove: One-shot music style transfer with supervision from synthetic data. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 2638–2650.
- Simple and Controllable Music Generation. arXiv preprint arXiv:2306.05284 (2023).
- Advance Mixed methods Research Designs. 209–240.
- Controllable deep melody generation via hierarchical music structure representation. arXiv preprint arXiv:2109.00663 (2021).
- Fred D Davis. 1989. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly (1989), 319–340.
- Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341 (2020).
- LP-MusicCaps: LLM-Based Pseudo Music Captioning. arXiv preprint arXiv:2307.16372 (2023).
- Multitrack Music Transformer. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
- VampNet: Music Generation via Masked Acoustic Token Modeling. arXiv preprint arXiv:2307.04686 (2023).
- InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models. arXiv preprint arXiv:2308.14360 (2023).
- Counterpoint by convolution. arXiv preprint arXiv:1903.07227 (2019).
- AI song contest: Human-AI co-creation in songwriting. arXiv preprint arXiv:2010.05388 (2020).
- Audiogpt: Understanding and generating speech, music, sound, and talking head. arXiv preprint arXiv:2304.12995 (2023).
- Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM international conference on multimedia. 1180–1188.
- Musical composition style transfer via disentangled timbre representations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 4697–4703.
- A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions. arXiv preprint arXiv:2011.06801 (2020).
- AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining. arXiv preprint arXiv:2308.05734 (2023).
- Novice-AI music co-creation via AI-steering tools for deep generative models. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.
- Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls. arXiv preprint arXiv:2307.10304 (2023).
- Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091 (2021).
- H Penny Nii. 1986. The blackboard model of problem solving and the evolution of blackboard architectures. AI magazine 7, 2 (1986), 38–38.
- Visualization for AI-Assisted Composing. In Proceedings of the 23th International Society for Music Information Retrieval Conference.
- Magenta studio: Augmenting creativity with deep learning in ableton live. (2019).
- Hybrid Transformers for Music Source Separation. In ICASSP 23.
- Ezra Sandzer-Bell. 2023. CHATGPT music: How to write prompts for chords and melodies. https://www.audiocipher.com/post/chatgpt-music
- Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580 (2023).
- Songmass: Automatic song writing with pre-training and alignment constraint. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13798–13805.
- MySong: automatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI conference on human factors in computing systems. 725–734.
- Peter Sobot. 2021. Pedalboard. https://doi.org/10.5281/zenodo.7817838
- Hao Hao Tan and Dorien Herremans. 2020. Music fadernets: Controllable music generation based on high-level features via low-level feature modelling. arXiv preprint arXiv:2007.15474 (2020).
- AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models. arXiv preprint arXiv:2304.00830 (2023).
- Learning interpretable representation for controllable polyphonic music generation. arXiv preprint arXiv:2008.07122 (2020).
- Music phrase inpainting using long-term representation and contrastive loss. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 186–190.
- Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671 (2023).
- Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
- MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling. In International Conference on Learning Representations.
- Deep music analogy via latent representation disentanglement. arXiv preprint arXiv:1906.03626 (2019).
- TorchAudio: Building Blocks for Audio and Speech Processing. arXiv preprint arXiv:2110.15018 (2021).
- Accomontage2: A complete harmonization and accompaniment arrangement system. arXiv preprint arXiv:2209.00353 (2022).
- BUTTER: A Representation Learning Framework for Bi-directional Music-Sentence Retrieval and Generation. NLP4MusA 2020 (2020), 54.
- COSMIC: A Conversational Interface for Human-AI Music Co-Creation. In NIME 2021. PubPub.
- Jingwei Zhao and Gus Xia. 2021. Accomontage: Accompaniment arrangement via phrase selection and style transfer. Proceedings of the 22th International Society for Music Information Retrieval Conference.
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).