Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance (1908.08717v1)

Published 23 Aug 2019 in cs.CL, cs.SD, and eess.AS

Abstract: This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous NLP applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However this global trend can be counterbalanced for speaker who are used to speak in the media when sufficient amount of data is available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mahault Garnerin (3 papers)
  2. Solange Rossato (5 papers)
  3. Laurent Besacier (76 papers)
Citations (48)

Summary

We haven't generated a summary for this paper yet.