Phonetic Segmentation of the UCLA Phonetics Lab Archive

Published 28 Mar 2024 in cs.CL, cs.SD, and eess.AS | (2403.19509v1)

Abstract: Research in speech technologies and comparative linguistics depends on access to diverse and accessible speech data. The UCLA Phonetics Lab Archive is one of the earliest multilingual speech corpora, with long-form audio recordings and phonetic transcriptions for 314 languages (Ladefoged et al., 2009). Recently, 95 of these languages were time-aligned with word-level phonetic transcriptions (Li et al., 2021). Here we present VoxAngeles, a corpus of audited phonetic transcriptions and phone-level alignments of the UCLA Phonetics Lab Archive, which uses the 95-language CMU re-release as our starting point. VoxAngeles also includes word- and phone-level segmentations from the original UCLA corpus, as well as phonetic measurements of word and phone durations, vowel formants, and vowel f0. This corpus enhances the usability of the original data, particularly for quantitative phonetic typology, as demonstrated through a case study of vowel intrinsic f0. We also discuss the utility of the VoxAngeles corpus for general research and pedagogy in crosslinguistic phonetics, as well as for low-resource and multilingual speech technologies. VoxAngeles is free to download and use under a CC-BY-NC 4.0 license.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces an enriched phonetic corpus (VoxAngeles) with manual corrections, detailed phone- and word-level segmentations, and additional phonetic measurements.
It utilizes advanced tools like the Montreal Forced Aligner combined with expert manual auditing to ensure high precision across 95 languages.
The study demonstrates actionable insights for phonetic typology and speech recognition, confirming that high vowels generally exhibit higher f0s than low vowels.

Phonetic Segmentation of the UCLAPhonetics LabArchive: An Analysis of VoxAngeles

The paper "Phonetic Segmentation of the UCLAPhonetics LabArchive," authored by Eleanor Chodroff, Blaž Pažon, Annie Baker, and Steven Moran, presents an enriched dataset named VoxAngeles. This corpus advances the exploitation of the UCLA Phonetics Lab Archive by providing manual corrections and additional phonetic measurements, alongside phone- and word-level segmentations. VoxAngeles, derived primarily from the CMU re-release, represents the continuation of efforts to make the UCLA Phonetics Lab Archive more accessible and analytically potent for phonetic and computational linguistics research.

Contributions and Methods

The work consolidates various earlier initiatives that embarked on organizing and time-aligning the phonetic data of the UCLA Phonetics Lab Archive's vast multilingual collection. With a focus on 95 languages, the dataset provides a detailed time-aligned and quality-audited phonetic corpus, facilitating detailed phonetic analysis and application in speech technologies. Employing technologies like the Montreal Forced Aligner and manual auditing by trained phonetic annotators, the corpus offers enhanced precision in phonetic analysis, capturing subtle distinctions in phonetic features across languages.

Notably, the research addresses several challenges, including inconsistent suprasegmental feature representation, the application of obsolete or non-standard symbols, and transcription-audio mismatches. Solutions involved the utilization of modern IPA symbols and where necessary, consulting original field notes to ensure phonetic accuracy and standardization.

Results and Implications

The corpus spans 95 languages from 21 language families, encompassing diverse phonetic inventories. It offers an opportunity for more inclusive phonetic studies, supporting phonological analysis and speech recognition research for low-resource languages. The authors conducted a case study on intrinsic f0, exploring the effect of vowel height on fundamental frequency. Their results, based on linear mixed-effects modeling, substantiate the existence of intrinsic f0 effects despite observed variability, confirming that high vowels generally exhibit higher f0s than low vowels.

This enhanced corpus can impact research areas such as phonetic typology, automatic speech recognition, and cross-linguistic phonetic analysis. By providing a standardized dataset with phonetic measurements, researchers can analyze phonetic universals more robustly and potentially develop improved speech recognition models.

Future Directions

The anticipation of broader analyses is met with acknowledgment of existing limitations, primarily the availability of single-speaker data per language in this release. To address these concerns, ongoing efforts aim to extract longer spoken passages and include additional speakers, which will strengthen the dataset's applicability and enrich phonetic models.

In conclusion, VoxAngeles represents a significant resource for phonetic analysis and linguistic research, contributing to the preservation and comprehension of linguistic diversity. The paper underscores the potential for this corpus to enhance low-resource language documentation, alignments, and the pursuit of phonetic universals, paving the way for future studies to gain deeper insights into cross-linguistic phonetic characteristics.

Markdown Report Issue