Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding (2111.02735v3)

Published 4 Nov 2021 in cs.CL, cs.NE, cs.SD, and eess.AS

Abstract: Speech self-supervised models such as wav2vec 2.0 and HuBERT are making revolutionary progress in Automatic Speech Recognition (ASR). However, they have not been totally proven to produce better performance on tasks other than ASR. In this work, we explored partial fine-tuning and entire fine-tuning on wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. With simple proposed downstream frameworks, the best scores reached 79.58% weighted accuracy on speaker-dependent setting and 73.01% weighted accuracy on speaker-independent setting for Speech Emotion Recognition on IEMOCAP, 2.36% equal error rate for Speaker Verification on VoxCeleb1, 89.38% accuracy for Intent Classification and 78.92% F1 for Slot Filling on SLURP, showing the strength of fine-tuned wav2vec 2.0 and HuBERT on learning prosodic, voice-print and semantic representations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yingzhi Wang (7 papers)
  2. Abdelmoumene Boumadane (1 paper)
  3. Abdelwahab Heba (4 papers)
Citations (130)

Summary

We haven't generated a summary for this paper yet.