Evaluating Responses to Complex Open-Ended Music Reasoning Questions
Develop rigorous methodologies and benchmarks to evaluate the quality of responses produced by multimodal audio–text language models to complex, open-ended musical reasoning questions.
References
Evaluating the quality of a models' responses to complex, open-ended questions is an open and unresolved research challenge.
— LLark: A Multimodal Instruction-Following Language Model for Music
(2310.07160 - Gardner et al., 2023) in Section 6.4 (Reasoning Tasks)