fastabx: A library for efficient computation of ABX discriminability

Published 5 May 2025 in cs.CL, cs.SD, and eess.AS | (2505.02692v1)

Abstract: We introduce fastabx, a high-performance Python library for building ABX discrimination tasks. ABX is a measure of the separation between generic categories of interest. It has been used extensively to evaluate phonetic discriminability in self-supervised speech representations. However, its broader adoption has been limited by the absence of adequate tools. fastabx addresses this gap by providing a framework capable of constructing any type of ABX task while delivering the efficiency necessary for rapid development cycles, both in task creation and in calculating distances between representations. We believe that fastabx will serve as a valuable resource for the broader representation learning community, enabling researchers to systematically investigate what information can be directly extracted from learned representations across several domains beyond speech processing. The source code is available at https://github.com/bootphon/fastabx.

Abstract PDF Chat (Pro)

Summary

FastABX: Enhancing ABX Discriminability Tasks

The paper introduces fastabx, a high-performance Python library aimed at streamlining the computation of ABX discrimination tasks. Fastabx addresses an evident gap in self-supervised learning (SSL), specifically in evaluating phonetic discriminability through efficient computation mechanisms. The library is significant for researchers working in the domain of unsupervised speech processing, as it facilitates rapid task creation and evaluation without the need for additional supervised probes, relying on the inherent information extractable from learned representations.

The ABX discriminability metric is derived from match-to-sample tasks common in human psychophysics, measuring the separability between two categories based on representation learning. This metric has gained prominence through its application in ZeroSpeech challenges focused on SSL models. It allows for phoneme-level evaluation in acoustic unit discovery tasks, and correlates with downstream LLMs' coherence in speech generation.

Fastabx improves upon previous implementations like ABXpy and Libri-Light, focusing on efficiency and modularity. ABXpy, while flexible, suffered from performance limitations, taking approximately 2 hours for LibriSpeech ABX discrimination tasks. Conversely, fastabx completes the same tasks in 2 minutes, leveraging a streamlined implementation. It incorporates a PyTorch C++/CUDA extension to optimize Dynamic Time Warping (DTW) computations, enabling efficient parallel processing on GPUs.

The practical implications of fastabx are substantial. By offering a flexible framework that can accommodate any specification of ON, BY, and ACROSS conditions, researchers can adapt the library to diverse evaluation contexts beyond speech, thereby enriching the representation learning field. Fastabx facilitates detailed analysis of phonetic contrasts, helping to discern mutual information in SSL representations while avoiding the biases and noise inherent in supervised probes.

In terms of future developments, the performance of the CUDA backend for DTW computations could be enhanced further. Additionally, integrating new subsampling methods could broaden its applicability. The library's adaptability positions it as pivotal for self-supervised training setups across varied modalities beyond phonetics, indicating potential expansion of ABX tasks into visual or multimodal representation evaluations.

Fastabx exemplifies the evolution and specialization of tools for representation learning, underscoring the importance of efficient, flexible, and domain-independent evaluation metrics in contemporary research landscapes.