GCC-PHAT Data Augmentation for SSL
- GDA is a feature-level augmentation strategy that generates synthetic SSL samples by shifting and scaling the dominant GCC-PHAT peaks to rebalance underrepresented classes.
- The method extracts key peak statistics—positions and amplitudes—from cross-correlation signals to maintain realistic DoA characteristics in augmented data.
- Empirical results show that GDA, especially when combined with ADIR, enhances SSL accuracy (up to 89%) and reduces mean absolute error, validating its effectiveness in incremental learning.
GCC-PHAT-based Data Augmentation (GDA) is a technique designed to address intra-task class imbalance in sound source localization (SSL) by generating synthetic training features derived from the peak statistics of the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) input domain. GDA operates at the feature level, utilizing the statistical properties of the dominant GCC-PHAT peaks to synthesize new examples for underrepresented (tail) classes, thereby ameliorating the long-tailed direction-of-arrival (DoA) distribution that commonly impairs SSL performance in real-world, incremental learning scenarios (Fan et al., 26 Jan 2026).
1. Mathematical Formulation of GCC-PHAT in SSL
GCC-PHAT transforms the time-domain microphone signals into frequency domain representations . The core feature, the GCC-PHAT function,
computes a phase-weighted cross-correlation that accentuates the time-lag corresponding to the DoA between paired microphones. For microphone pairs and delay bins per pair, feature extraction over a frame yields the vector
These features are the canonical input for downstream incremental SSL models in the described learning pipeline.
2. Peak Statistic Extraction and Characterization
Empirical analysis demonstrates that for each microphone pair, a single dominant peak in the GCC-PHAT domain conveys the DoA information. For each input sample and microphone-pair (), GDA extracts:
- Peak position: ,
- Peak amplitude: .
Aggregate statistics over all samples of class are computed: together with empirical variances. No clustering beyond this dominant peak is necessary, as only the highest peak per segment is relocated during augmentation.
3. GDA Algorithmic Procedure
The augmentation process targets all “tail” classes per task —those with sample count , where and . For each such class , synthetic examples are generated according to Algorithm 1:
- Initialization: Start with .
- For each microphone pair :
- Extract the th slice from a base abundant-class sample.
- Compute shift .
- Cyclically shift by samples along delay.
- Rescale such that .
- Inject i.i.d. Gaussian noise , .
- Place result in .
- Repeat for synthetic variants using independent bases.
This strategy ensures synthetic features for tail classes inherit the dominant temporal and amplitude signature of abundant classes, adjusted to the mean DoA characteristics of the target class.
4. Hyperparameterization and Statistical Rationale
Key hyperparameters are:
- microphone pairs,
- delay bins per pair,
- : target tail class cardinality reaches 50% of the largest class post-augmentation,
- : low-variance noise introduces moderate feature diversity.
No secondary amplitude thresholding or multi-peak selection is required. GDA targets exactly the classes beneath the thresholded occupancy, achieving a more uniform class histogram and ensuring rare DoA labels are sufficiently represented.
5. Integration into Incremental SSL Training
In each incremental learning task, prior to classifier update:
- GCC-PHAT features are extracted from task samples.
- Tail classes are detected by occupancy.
- For each, synthetic features are generated following Algorithm 1 and labeled identically to real data.
- The data for task becomes .
The model architecture consists of a frozen 3-layer MLP feature extractor (fixed after first task) and a task-adaptive linear classifier, updated with the Analytic Dynamic Imbalance Rectifier (ADIR). The cross-entropy objective
is evaluated on the GDA-augmented dataset. No additional regularization is applied. GDA's effect is to increase rare class counts, directly influencing the class-weighting scheme in ADIR () and reducing global Gini imbalance.
6. Empirical Impact and Ablation
Ablation studies isolate the contribution of GDA in combination and separately from ADIR. Reported results on the SSLR benchmark are:
- Baseline (no GDA/ADIR): MAE = 7.5°, ACC = 72.0%, BWT = –17.7
- GDA only: MAE = 7.4°, ACC = 75.0%, BWT = –15.8
- ADIR only: MAE = 6.1°, ACC = 82.4%, BWT = +1.4
- GDA + ADIR: MAE = 5.3°, ACC = 89.0%, BWT = +1.6
This demonstrates that GDA independently yields a 3-point increase in accuracy by mitigating intra-task skew. In synergy with ADIR, the framework delivers substantial gains—accuracy increases to 89% and mean absolute error drops to 5.3°, alongside positive backward transfer (Fan et al., 26 Jan 2026).
7. Context and Significance in Robust SSL
GDA addresses structural challenges in SSL arising from unbalanced DoA class histograms, prevalent in naturalistic, evolving sound field deployments. By operating on analytically defined feature statistics and integrating seamlessly with analytic incremental imbalance rectification (ADIR), GDA enables on-the-fly, low-complexity rebalancing of each task dataset. The method avoids the need for storage of past task exemplars and avoids introducing hand-tuned regularizers or high-variance heuristic augmentations. A plausible implication is streamlined adoption for real-world incrementally learned acoustic localization systems, particularly under non-stationary directional statistics.