CGGD-MLDR Beamformer for Speech Enhancement
- CGGD-MLDR is a robust beamforming framework that uses a complex generalized Gaussian prior to enhance multichannel speech signals in challenging acoustic environments.
- It employs alternating optimization over frame-wise scale parameters and beamformer weights to derive a weighted MPDR solution for improved covariance estimation.
- Experimental results show that using a super-Gaussian prior (p=0.5) leads to notable PESQ improvements, especially under low SINR and moderate reverberation conditions.
The Complex Generalized Gaussian Prior Maximum-Likelihood Distortionless-Response (CGGD-MLDR) beamformer constitutes a robust multichannel speech enhancement framework based on a statistical model using a complex generalized Gaussian distribution (CGGD) for speech sparse priors. The CGGD-MLDR generalizes classical minimum power distortionless response (MPDR) methods, achieving improved robustness in challenging acoustic environments, particularly when target speech is modeled as super-Gaussian and input conditions include low signal-to-interference-plus-noise ratio (SINR) or steering vector mismatch. Its formulation delivers data-driven adaptation of the covariance estimate through alternating optimization over frame-wise scale parameters and beamformer weights, fundamentally extending classical Gaussian-based distortionless response methods (Meng et al., 2021).
1. Complex Generalized Gaussian Prior
The complex generalized Gaussian distribution (CGGD) models short-time Fourier transform (STFT) coefficients of the target speech, , as zero-mean random variables with the parameterization
where is the shape parameter: yields the circular Gaussian; produces a super-Gaussian (“heavy-tailed”) density; produces a sub-Gaussian density. is a scale parameter and is the Gamma function. The CGGD can equivalently be written as a scale mixture of circular Gaussians: $\rho\left(S\right) = \max_{\lambda_s>0} \left\{ \mathcal N_{\C}\left(S;0,\lambda_s\right)\,\psi(\lambda_s) \right\},$ where $\mathcal N_{\C}(S;0,\lambda_s)$ is the zero-mean circular complex Gaussian with variance , and is the corresponding positive scaling function.
2. Maximum-Likelihood Cost Function under CGGD
Given multi-microphone observations , the distortionless response beamforming constraint,
enforces that the estimated source remains undistorted, where is the acoustic transfer function (ATF) vector. The beamformer output is
The joint likelihood under the CGGD scale mixture model, marginalized over latent scale variables, leads to the negative log-likelihood cost function (dropping constants),
to be minimized under the constraint . In practical scenarios, the unknown is replaced by the current beamformer estimate in an iterative scheme.
3. Alternating Optimization and Beamformer Estimation
Optimization alternates between scale variables and the beamformer weights as follows:
- -update:
For fixed , the optimal scale update is
Since is invariant to the scale of , the practical update becomes
- -update:
For fixed , minimizing with produces a weighted MPDR solution with covariance matrix,
yielding the beamformer,
The iterative process involves re-computation of , , and , typically converging in 2–3 iterations.
4. Relationships to MPDR, MLDR, and MDDR
The CGGD-MLDR framework subsumes canonical beamforming approaches as special cases:
- Standard MPDR (Gaussian, ):
When , , so $\,_{\rm CGGD}\widehat{\mathbf R}_{yy}$ reduces to the conventional empirical covariance and the solution coincides with the standard MPDR beamformer.
- MLDR ():
For , , recovering the weighted MPDR (wMPDR) or MLDR formulation.
- Narrowband, -norm minimization:
With a single narrowband snapshot, the CGGD-MLDR cost reduces to
which is exactly the Minimum Dispersion Distortionless Response (MDDR) beamformer defined by -norm minimization. The CGGD prior thus statistically justifies the selection of .
5. Robustness Mechanisms and Theoretical Insights
The CGGD-MLDR beamformer achieves robustness by adaptively downweighting frames that are dominated by target speech in the covariance estimate. Under stationarity and partitioning frames into speech and noise-only segments, the covariance estimate is expressible as
where (resp. ) is the number of speech (resp. noise) frames, , and is a small floor parameter. The noise-to-speech mixing ratio,
obeys for , ensuring that the contribution of noise is at least as large as in the Gaussian case. Downweighting of high-energy speech frames mitigates target cancellation and increases robustness under low SINR and steering vector mismatches (Meng et al., 2021).
6. Experimental Evaluation and Empirical Results
Experiments employ 10 TIMIT speech signals, NOISEX-92 babble noise, six-element uniform linear arrays (4 cm spacing), and room simulations via the image method at various reverberation times ( ms), with the desired speaker at 0° and two interferences at ±45°, all located at 2 m. PESQ improvement is the primary metric.
Key findings include:
- CGGD-MLDR, using a super-Gaussian prior with , consistently outperforms both standard MPDR () and MLDR () in PESQ improvement.
- Convergence is rapid (2–3 iterations suffice to approach the oracle MVDR bound).
- Robustness is notably enhanced under low input SINR (down to –5 dB) and moderate reverberation ( ms). In these settings, MPDR may degrade perceptual quality (PESQ decrease), whereas CGGD-MLDR maintains positive PESQ improvement.
7. Significance and Applications
The CGGD-MLDR framework provides a rigorously derived, general-purpose approach for multichannel speech enhancement that leverages statistical speech priors to adaptively and robustly estimate spatial covariance. By employing a super-Gaussian CGGD prior, the method aggressively downweights high-energy speech frames in covariance estimation, resulting in improved beamforming robustness against reverberation, interfering signals, and model mismatch. This framework generalizes classical MPDR and MDDR approaches and establishes empirical benchmarks in terms of speech enhancement quality for microphone array signal processing (Meng et al., 2021).