Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
138 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance (2401.02979v1)

Published 31 Dec 2023 in cs.CL, cs.AI, and cs.IR

Abstract: Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Learning emotion-enriched word representations. In Proceedings of the 27th international conference on computational linguistics. 950–961.
  2. BAAI. 2023. BGE repository. https://github.com/FlagOpen/FlagEmbedding
  3. Michel Bernays and Caroline Traube. 2010. Expression of piano timbre: gestural control, perception and verbalization. In Proceedings of CIM09: The 5th Conference on Interdisciplinary Musicology.
  4. Michel Bernays and Caroline Traube. 2011. Verbal expression of piano timbre: Multidimensional semantic space of adjectival descriptors. In Proceedings of the international symposium on performance science (ISPS2011). European Association of Conservatoires (AEC) Utrecht, Netherlands, 299–304.
  5. Michel Bernays and Caroline Traube. 2013. Expressive production of piano timbre: touch and playing techniques for timbre control in piano performance. In Proceedings of the 10th Sound and Music Computing Conference (SMC2013). KTH Royal Institute of Technology Stockholm, Sweden, 341–346.
  6. Michel Bernays and Caroline Traube. 2014. Investigating pianists’ individuality in the performance of five timbral nuances through patterns of articulation, touch, dynamics, and pedaling. Frontiers in Psychology 5 (2014), 157.
  7. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  8. Sorting Musical Expression: Characterization of Descriptions of Expressive Piano Performances. In Extended Abstract in 16th International Conference on Music Perception and Cognition ICMPC 2021) and 11th Triennial Conference of ESCOM (ICMPC-ESCOM 2021).
  9. Con Espressione Dataset. https://zenodo.org/record/3968828
  10. On the Characterization of Expressive Performance in Classical Music: First Results of the Con Espressione Game. In Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020. Online.
  11. Tuomas Eerola and Jonna K Vuoskoski. 2012. A review of music and emotion studies: Approaches, emotion models, and stimuli. Music Perception: An Interdisciplinary Journal 30, 3 (2012), 307–340.
  12. An integrative review of the enjoyment of sadness associated with music. Physics of Life Reviews 25 (2018), 100–121.
  13. Clap learning audio concepts from natural language supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
  14. Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics. arXiv:1805.08501 [cs.SD]
  15. Roman Feldbauer and Arthur Flexer. 2019. A comprehensive empirical comparison of hubness reduction in high-dimensional spaces. Knowledge and Information Systems 59, 1 (2019), 137–166.
  16. Fast approximate hubness reduction for large high-dimensional data. In 2018 IEEE International Conference on Big Knowledge (ICBK). IEEE, 358–367.
  17. Arthur Flexer. 2023. Can ChatGPT be useful for distant reading of music similarity?. In Proceedings of the 2nd Workshop on Human-Centric Music Information Retrieval (HCMIR), 2023.
  18. Alf Gabrielsson. 2003. Music performance research at the millennium. Psychology of music 31, 3 (2003), 221–272.
  19. Carol L Krumhansl. 1989. Why is musical timbre so hard to understand. Structure and perception of electroacoustic sound and music 9 (1989), 43–53.
  20. An Interdisciplinary Review of Music Performance Analysis. Transactions of the International Society for Music Information Retrieval (Nov 2020). https://doi.org/10.5334/tismir.53
  21. Towards General Text Embeddings with Multi-stage Contrastive Learning. arXiv:2308.03281 [cs.CL]
  22. Large language models predict human sensory judgments across six modalities. arXiv:2302.01308 [cs.CL]
  23. MTEB: Massive Text Embedding Benchmark. (2022). arXiv:2210.07316 [cs.CL]
  24. MTEB Leaderboard. https://huggingface.co/spaces/mteb/leaderboard
  25. OpenAI. 2022. New and improved embedding model. https://openai.com/blog/new-and-improved-embedding-model
  26. Caroline Palmer. 1997. Music performance. Annual review of psychology 48, 1 (1997), 115–138.
  27. James A Russell. 1980. A Circumplex Model of Affect. Journal of Personality and Social Psychology 39, 6 (1980), 1161–1178.
  28. Timbre semantics through the lens of crossmodal correspondences: A new way of asking old questions. Acoustical Science and Technology 41, 1 (2020), 365–368.
  29. Linking musical metaphors and emotions evoked by the sound of classical music. Psychology of Music 50, 1 (2022), 245–264.
  30. “Hearing music as…”: Metaphors evoked by the sound of classical music. Psychomusicology: Music, Mind, and Brain 29, 2-3 (2019), 100.
  31. Local and global scaling reduce hubs in space. Journal of Machine Learning Research 13, 10 (2012).
  32. Kai Siedenburg and Charalampos Saitis. 2023. The language of sounds unheard: Exploring musical timbre semantics of large language models. arXiv:2304.07830 [cs.CL]
  33. Perceived and induced emotion responses to popular music: Categorical and dimensional models. Music Perception: An Interdisciplinary Journal 33, 4 (2016), 472–492.
  34. Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8, 4 (2008), 494.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com