Phonetic and Lexical Discovery of a Canine Language using HuBERT (2402.15985v1)
Abstract: This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identification of vocal patterns that suggest a rudimentary vocabulary within dog vocalizations. Our findings indicate a significant acoustic consistency in these identified canine vocabulary, covering the entirety of observed dog vocalization sequences. We further develop a web-based dog vocalization labeling system. This system can highlight phoneme n-grams, present in the vocabulary, in the dog audio uploaded by users.
- Deep machine learning techniques for the detection and classification of sperm whale bioacoustics. Scientific reports, 9(1):12588.
- Vggsound: A large-scale audio-visual dataset. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 721–725. IEEE.
- Who let the dogs out? modeling dog behavior from visual data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4051–4060.
- Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 776–780. IEEE.
- Masato Hagiwara. 2023. Aves: Animal vocalization encoder based on self-supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
- What is my dog trying to tell me? the automatic recognition of the context and perceived emotion of dog barks. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5134–5138. IEEE.
- David Holdcroft. 1991. Saussure: signs, system and arbitrariness. Cambridge University Press.
- Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451–3460.
- Transcribing vocal communications of domestic shiba lnu dogs. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13819–13832.
- Rescue dog action recognition by integrating ego-centric video, sound and sensor information. In Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part III, pages 321–333. Springer.
- Audiocaps: Generating captions for audios in the wild. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 119–132.
- Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2880–2894.
- Comparing supervised learning methods for classifying sex, age, context and individual mudi dogs from barking. Animal cognition, 18(2):405–421.
- Separate anything you describe. arXiv preprint arXiv:2308.05037.
- Classification of dog barks: a machine learning approach. Animal Cognition, 11:389–400.
- Aleida Paladini. 2020. The bark and its meanings in inter and intra-specific language. Dog behavior, 6(1):21–30.
- Acoustic parameters of dog barks carry emotional information for humans. Applied Animal Behaviour Science, 100(3-4):228–240.
- Robert L Robbins. 2000. Vocal communication in free-ranging african wild dogs (lycaon pictus). Behaviour, pages 1271–1298.
- Using machine learning to decode animal communication. Science, 381(6654):152–155.
- Towards lexical analysis of dog vocalizations via online videos. arXiv preprint arXiv:2309.13086.