The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?
Abstract: In this work, we explore the use and reliability of LLMs in musicology. From a discussion with experts and students, we assess the current acceptance and concerns regarding this, nowadays ubiquitous, technology. We aim to go one step further, proposing a semi-automatic method to create an initial benchmark using retrieval-augmented generation models and multiple-choice question generation, validated by human experts. Our evaluation on 400 human-validated questions shows that current vanilla LLMs are less reliable than retrieval augmented generation from music dictionaries. This paper suggests that the potential of LLMs in musicology requires musicology driven research that can specialized LLMs by including accurate and reliable domain knowledge.
- Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219.
- GPT-4 technical report. arXiv preprint arXiv:2303.08774.
- AI@Meta. 2024. Llama 3 model card. Final report not published yet.
- Hussam Alkaissi and Samy I McFarlane. 2023. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15(2).
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv preprint arXiv:2310.11511.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495.
- Musicology. In The Grove Music Online. Oxford University Press.
- Bias and fairness in large language models: A survey. Computational Linguistics.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conference (SciPy2008).
- Louis Harap. 1937. On the nature of musicology. The Musical Quarterly, 23(1).
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity. arXiv preprint arXiv:2403.14403.
- Mixtral of experts. arXiv preprint arXiv:2401.04088.
- ChatGPT for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274.
- Saydulu Kolasani. 2023. Optimizing natural language processing, large language models (llms) for efficient customer service, and hyper-personalization to enable sustainable growth and revenue. Transactions on Latest Trends in Artificial Intelligence, 4(4).
- Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of Advances in Neural Information Processing Systems, 33 (NeurIPS 2020).
- HaluEval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- ChatQA: Building GPT-4 Level Conversational QA Models. arXiv preprint arXiv:2401.10225.
- Edisa Lozić and Benjamin Štular. 2023. Fluent but not factual: A comparative analysis of ChatGPT and other ai chatbots’ proficiency and originality in scientific writing for humanities. Future Internet, 15(10).
- Large language models: A survey. arXiv preprint arXiv:2402.06196.
- Anglekindling: Supporting journalistic angle ideation with large language models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI 23).
- Nitin Rane. 2023. Role and challenges of ChatGPT and similar generative artificial intelligence in arts and humanities. Available at SSRN 4603208.
- Nitin Rane and Saurabh Choudhary. 2024. Role and challenges of ChatGPT, Google Bard, and similar generative Artificial Intelligence in Arts and Humanities. Studies in Humanities and Education, 5(1).
- Measuring attribution in natural language generation models. Computational Linguistics, 49(4).
- Stanley Sadie and John Tyrrell, editors. 2001. The New Grove Dictionary of Music and Musicians, 2nd edition. Macmillan Publishers, London. Grove Music Online. Edited by Deane Root. Accessed 05-05-2024. http://www.oxfordmusiconline.com.
- Large language models in medicine. Nature Medicine, 29(8).
- William H Walters and Esther Isabelle Wilder. 2023. Fabrication and errors in the bibliographic citations generated by chatgpt. Scientific Reports, 13(1):14045.
- Emergent abilities of large language models. Transactions Machine Learning Research, 2022.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022).
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.
- Benjamin Weiser. 2024. Here’s what happens when your lawyer uses ChatGPT. New York Times. Accessed 05-05-2024.
- LiveBench: A Challenging, Contamination-Free LLM Benchmark. arXiv preprint arXiv:2406.19314.
- Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.