Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speaker-Independent Acoustic-to-Articulatory Speech Inversion (2302.06774v2)

Published 14 Feb 2023 in eess.AS and cs.SD

Abstract: To build speech processing methods that can handle speech as naturally as humans, researchers have explored multiple ways of building an invertible mapping from speech to an interpretable space. The articulatory space is a promising inversion target, since this space captures the mechanics of speech production. To this end, we build an acoustic-to-articulatory inversion (AAI) model that leverages self-supervision to generalize to unseen speakers. Our approach obtains 0.784 correlation on an electromagnetic articulography (EMA) dataset, improving the state-of-the-art by 12.5\%. Additionally, we show the interpretability of these representations through directly comparing the behavior of estimated representations with speech production behavior. Finally, we propose a resynthesis-based AAI evaluation metric that does not rely on articulatory labels, demonstrating its efficacy with an 18-speaker dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Peter Wu (32 papers)
  2. Li-Wei Chen (30 papers)
  3. Cheol Jun Cho (12 papers)
  4. Shinji Watanabe (416 papers)
  5. Louis Goldstein (9 papers)
  6. Alan W Black (83 papers)
  7. Gopala K. Anumanchipalli (16 papers)
Citations (23)

Summary

We haven't generated a summary for this paper yet.