Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 12 tok/s Pro

GPT-5 High 21 tok/s Pro

GPT-4o 81 tok/s Pro

Kimi K2 231 tok/s Pro

GPT OSS 120B 435 tok/s Pro

Claude Sonnet 4 33 tok/s Pro

2000 character limit reached

What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure (2101.00387v2)

Published 2 Jan 2021 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: In recent times, BERT based transformer models have become an inseparable part of the 'tech stack' of text processing models. Similar progress is being observed in the speech domain with a multitude of models observing state-of-the-art results by using audio transformer models to encode speech. This begs the question of what are these audio transformer models learning. Moreover, although the standard methodology is to choose the last layer embedding for any downstream task, but is it the optimal choice? We try to answer these questions for the two recent audio transformer models, Mockingjay and wave2vec2.0. We compare them on a comprehensive set of language delivery and structure features including audio, fluency and pronunciation features. Additionally, we probe the audio models' understanding of textual surface, syntax, and semantic features and compare them to BERT. We do this over exhaustive settings for native, non-native, synthetic, read and spontaneous speech datasets

Citations (76)

View on Semantic Scholar