EdAcc: Edinburgh Accents Corpus
- EdAcc is a spontaneous, dyadic conversational English speech dataset offering diverse accents and robust metadata for advanced ASR research.
- It comprises nearly 40 hours of conversations from 61 sessions with 122 unique speakers, covering over 40 English accents and a wide range of L1 backgrounds.
- The dataset supports evaluation of speaker anonymization and accent-fair ASR systems, demonstrating improved privacy benchmarks compared to LibriSpeech.
The Edinburgh International Accents of English Corpus (EdAcc) is a spontaneous, dyadic conversational speech dataset designed to address critical gaps in English speech corpora for automatic speech recognition (ASR), speaker anonymization, and accent-robust language technology development. Distinguished by its sociolinguistic diversity, conversational speech style, and meticulous speaker metadata, EdAcc is structured to minimize lexical content-based speaker leakage—a pronounced defect in benchmark corpora such as LibriSpeech—while supplying a robust resource for the analysis and evaluation of ASR and speaker-privacy algorithms across a globally representative spectrum of English varieties (Franzreb et al., 19 Jan 2026, Sanabria et al., 2023).
1. Origin and Goals
EdAcc was created to remedy limitations in prevalent datasets for ASR and speaker-anonymization evaluation—chiefly, the overrepresentation of US/UK native-read speech and the risk of content-induced speaker disclosure. In LibriSpeech, for example, speakers read unique books, making vocabulary distributions speaker-specific and enabling content-driven identity leakage even through “perfect” acoustic anonymization systems. EdAcc’s primary objectives are:
- To provide spontaneous, dyadic, conversational English speech as opposed to read speech.
- To encompass a broad array of English accents and first-language (L1) backgrounds, with representation from L1 and L2 speakers varying in age, ethnicity, and region.
- To mitigate vocabulary-based speaker identification and encourage evaluation of anonymization and ASR systems where acoustic and paralinguistic cues, rather than lexical content, drive model performance (Franzreb et al., 19 Jan 2026).
2. Dataset Composition and Speaker Diversity
EdAcc comprises nearly 40 hours of wide-band English conversational audio, drawn from 61 Zoom video calls (each 20–60 minutes) between friend/acquaintance pairs. No speaker appears in both the official development (31 conversations, ∼20 h) and test (30 conversations, ∼19 h) sets, resulting in approximately 122 unique participants (Sanabria et al., 2023). The released subsets for privacy experiments total about 22 hours (dev ∼11 h, test ∼11 h).
Speaker and linguistic diversity is central to the corpus:
- Over 40 reported English accents (L1 varieties), with at least 51 distinct L1 backgrounds among L2 speakers.
- Accented Englishes include, for example, American, Irish, Jamaican, Nigerian, and Spanish-accented English.
- Socio-demographic and biographical metadata per speaker: age bracket, gender, ethnicity, first/second languages, age/year of English acquisition (for L2), countries lived in, and self-ascribed accent.
Dyadic sessions elicit spontaneous discourse using open prompts, yielding natural prosody and interactional structure. Each participant also records the canonical “Stella” passage, facilitating accent-controlled phonetic analyses (Sanabria et al., 2023).
3. Data Structure, Annotation, and Access
Dataset releases comprise:
- Mono-channel, wide-band WAV audio (16 kHz), packaged by conversation ID.
- Orthographic, human-verified transcripts aligned at the utterance/turn level, annotated for disfluencies, laughter, and background sounds.
- Speaker metadata CSV files and data statements.
- Standard Kaldi-style directory structure (wav.scp, text, utt2spk, spk2utt) to maximize compatibility with ASR pipelines.
Annotation protocol includes manual segmentation, professional transcription with privacy compliance (source deletions within 10 days), and post-processing for lexical normalization and alignment with ASR lexicons (Sanabria et al., 2023). All data are distributed under the Creative Commons Attribution-ShareAlike (CC-BY-SA) license and are downloadable from the project website (https://groups.inf.ed.ac.uk/edacc/).
4. Speaker Anonymization and Content Leakage Analysis
A primary application for EdAcc is the privacy evaluation of speaker-anonymization systems. The evaluation protocol follows VoicePrivacy 2024 using the SpAnE framework, ECAPA-TDNN speaker recognition (SpeechBrain), and the Equal Error Rate (EER) metric:
- Enrollment: 20 anonymized utterances/speaker → extract mean embedding.
- Trial: 20 anonymized utterances/speaker → embeddings compared to enrollment via cosine similarity.
- EER: rate at which False Accept = False Reject; 0% = full identification, 50% = random guessing (maximum privacy).
A “perfect” anonymizer is implemented as a speech-to-text-to-speech (STT-TTS) pipeline (Whisper-small → FastPitch+HiFi-GAN), eliminating all speaker-specific acoustic information so that only content remains as an identifier. Additionally, "phone-only" and "phones+durations" feature attacks isolate vocabulary- and temporal-pattern-based leakage:
where is the input matrix (vocabulary index , frame ), is phone duration, maps phone to , and is the Kronecker delta (Franzreb et al., 19 Jan 2026).
5. Comparative Privacy Benchmarking: EdAcc vs. LibriSpeech
EdAcc is contrasted with LibriSpeech in terms of vulnerability to content-driven speaker recognition attacks. Key results:
| Features | LibriSpeech Original | LibriSpeech STT-TTS | EdAcc Original | EdAcc STT-TTS |
|---|---|---|---|---|
| Mel-spectrogram | 0.4 % | 34.8 % | 6.5 % | 45.9 % |
| Phones + durations | 23.7 % | 34.5 % | 39.0 % | 45.0 % |
| Phones only | 30.4 % | 32.3 % | 42.1 % | 48.5 % |
On original speech with full acoustic features, LibriSpeech speakers are identified almost perfectly (0.4 % EER). Even after STT-TTS anonymization, content (vocabulary, phone durations) yields 34–35 % EER, indicating considerable content-based leakage. In contrast, EdAcc’s STT-TTS anonymized data pushes EER values to ≈45–48 %, approaching the 50 % “ideal” for perfect anonymization, indicating minimal content-driven identification (Franzreb et al., 19 Jan 2026).
Population-segment privacy analysis reveals that intra- and inter-accent EERs in EdAcc remain high: for example, American and Jamaican speakers yield intra- and inter-EERs ≈47–48 % and ≈42 % respectively, indicating substantial privacy within and across accent groups.
6. Robustness, Fairness, and ASR Benchmarking
EdAcc poses pronounced challenges to current ASR models. Baseline results include:
| Model | EdAcc dev | EdAcc test | Libri-test-clean | Libri-test-other |
|---|---|---|---|---|
| Wav2vec2.0 (fine-tuned) | 33.4 % | 36.1 % | 2.9 % | 5.6 % |
| Commercial engine | 17.9 % | 18.7 % | 3.8 % | 7.4 % |
| Whisper (large) | 16.4 % | 19.7 % | 2.7 % | 5.6 % |
Whisper (large) achieves ∼19.7 % WER on EdAcc test—contrasting with its 2.7 % WER on US English read speech (LibriSpeech)—a ∼17 point gap. Performance degradation is especially severe for Indian, Jamaican, and Nigerian English speakers (e.g., WER > 21–22 % for these groups) (Sanabria et al., 2023). This evidences the corpus’s utility for accent-fairness research and model adaptation.
7. Use Cases, Impact, and Future Directions
EdAcc facilitates diverse research avenues:
- Speaker-anonymization system evaluation insensitive to content leakage, directing attackers toward acoustic rather than lexical cues.
- Benchmarking, adaptation, and debiasing of ASR and TTS systems for inclusive performance across L1/L2 and minoritized accents.
- Analyses of accent classification, sociophonetic variation, and population-segment privacy (via intra-/inter-EER).
- Development and audit of voice conversion/cross-accent transfer models with spontaneous speech (Franzreb et al., 19 Jan 2026).
Potential extensions include expanding corpus hours and L2 speaker backgrounds, introducing more realistic/noisy recording settings, and releasing detailed phonetic alignments.
EdAcc is referenced as: Sanabria, R. et al., “The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR,” ICASSP 2023 (Sanabria et al., 2023), with corpus access via https://groups.inf.ed.ac.uk/edacc/.
References
- (Franzreb et al., 19 Jan 2026) Content Leakage in LibriSpeech and Its Impact on the Privacy Evaluation of Speaker Anonymization
- (Sanabria et al., 2023) The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR