Papers
Topics
Authors
Recent
Search
2000 character limit reached

CGGD-MLDR Beamformer for Speech Enhancement

Updated 24 March 2026
  • CGGD-MLDR is a robust beamforming framework that uses a complex generalized Gaussian prior to enhance multichannel speech signals in challenging acoustic environments.
  • It employs alternating optimization over frame-wise scale parameters and beamformer weights to derive a weighted MPDR solution for improved covariance estimation.
  • Experimental results show that using a super-Gaussian prior (p=0.5) leads to notable PESQ improvements, especially under low SINR and moderate reverberation conditions.

The Complex Generalized Gaussian Prior Maximum-Likelihood Distortionless-Response (CGGD-MLDR) beamformer constitutes a robust multichannel speech enhancement framework based on a statistical model using a complex generalized Gaussian distribution (CGGD) for speech sparse priors. The CGGD-MLDR generalizes classical minimum power distortionless response (MPDR) methods, achieving improved robustness in challenging acoustic environments, particularly when target speech is modeled as super-Gaussian and input conditions include low signal-to-interference-plus-noise ratio (SINR) or steering vector mismatch. Its formulation delivers data-driven adaptation of the covariance estimate through alternating optimization over frame-wise scale parameters and beamformer weights, fundamentally extending classical Gaussian-based distortionless response methods (Meng et al., 2021).

1. Complex Generalized Gaussian Prior

The complex generalized Gaussian distribution (CGGD) models short-time Fourier transform (STFT) coefficients of the target speech, S(k,l)S(k,l), as zero-mean random variables with the parameterization

ρ(S(k,l);  p,γ)=p2πγ2Γ(2/p)exp(S(k,l)/γp),\rho\bigl(S(k,l);\;p,\gamma\bigr) = \frac{p}{2\pi\,\gamma^2\,\Gamma\left(2/p\right)} \exp\left(-\left|S(k,l)/\gamma\right|^p\right),

where p>0p > 0 is the shape parameter: p=2p = 2 yields the circular Gaussian; p<2p < 2 produces a super-Gaussian (“heavy-tailed”) density; p>2p > 2 produces a sub-Gaussian density. γ>0\gamma > 0 is a scale parameter and Γ()\Gamma(\cdot) is the Gamma function. The CGGD can equivalently be written as a scale mixture of circular Gaussians: $\rho\left(S\right) = \max_{\lambda_s>0} \left\{ \mathcal N_{\C}\left(S;0,\lambda_s\right)\,\psi(\lambda_s) \right\},$ where $\mathcal N_{\C}(S;0,\lambda_s)$ is the zero-mean circular complex Gaussian with variance λs\lambda_s, and ψ(λs)\psi(\lambda_s) is the corresponding positive scaling function.

2. Maximum-Likelihood Cost Function under CGGD

Given multi-microphone observations y(k,l)\mathbf y(k,l), the distortionless response beamforming constraint,

wH(k)h(k)=1,\mathbf w^H(k)\,\mathbf h(k) = 1,

enforces that the estimated source remains undistorted, where h(k)\mathbf h(k) is the acoustic transfer function (ATF) vector. The beamformer output is

S^(k,l)=wH(k)y(k,l).\widehat S(k,l) = \mathbf w^H(k)\mathbf y(k,l).

The joint likelihood under the CGGD scale mixture model, marginalized over latent scale variables, leads to the negative log-likelihood cost function (dropping constants),

Jk(w,{λs})=l=1L{S^(k,l)2λs(k,l)+ln(πλs(k,l))lnψ(λs(k,l))},\mathcal J_k(\mathbf w,\{\lambda_s\}) = \sum_{l=1}^{\mathcal L} \left\{ \frac{|\widehat S(k,l)|^2}{\lambda_s(k,l)} + \ln\left(\pi\,\lambda_s(k,l)\right) - \ln\psi\left(\lambda_s(k,l)\right) \right\},

to be minimized under the constraint wHh=1\mathbf w^H\mathbf h = 1. In practical scenarios, the unknown SS is replaced by the current beamformer estimate in an iterative scheme.

3. Alternating Optimization and Beamformer Estimation

Optimization alternates between scale variables {λs(k,l)}\{\lambda_s(k,l)\} and the beamformer weights w(k)\mathbf w(k) as follows:

  • λs\lambda_s-update:

For fixed w\mathbf w, the optimal scale update is

λs(k,l)S^(k,l)2p.\lambda_s(k,l) \propto \left| \widehat S(k,l) \right|^{2-p}.

Since w\mathbf w is invariant to the scale of {λs}\{\lambda_s\}, the practical update becomes

λ^s(k,l)S^(k,l)2p.\widehat\lambda_s(k,l) \leftarrow \left|\widehat S(k,l)\right|^{2-p}.

  • w\mathbf w-update:

For fixed {λs}\{\lambda_s\}, minimizing lS^2/λs\sum_{l} |\widehat S|^2/\lambda_s with wHh=1\mathbf w^H\mathbf h=1 produces a weighted MPDR solution with covariance matrix,

CGGDR^yy(k)=l=1Ly(k,l)yH(k,l)λ^s(k,l),_{\rm CGGD}\widehat{\bf R}_{yy}(k) = \sum_{l=1}^{\mathcal L} \frac{\mathbf y(k,l)\,\mathbf y^H(k,l)}{\widehat\lambda_s(k,l)},

yielding the beamformer,

w^CGGD(k)=(CGGDR^yy(k))1h(k)hH(k)(CGGDR^yy(k))1h(k).\mathbf{\widehat w}_{\rm CGGD}(k) = \frac{ \left(_{\rm CGGD}\widehat{\bf R}_{yy}(k)\right)^{-1} \mathbf h(k) } { \mathbf h^H(k)\,\left(_{\rm CGGD}\widehat{\bf R}_{yy}(k)\right)^{-1} \mathbf h(k) }.

The iterative process involves re-computation of S^\widehat S, {λs}\{\lambda_s\}, and w\mathbf w, typically converging in 2–3 iterations.

4. Relationships to MPDR, MLDR, and MDDR

The CGGD-MLDR framework subsumes canonical beamforming approaches as special cases:

  • Standard MPDR (Gaussian, p=2p=2):

When p=2p=2, λ^s1p/2=1\widehat\lambda_s^{1-p/2}=1, so $\,_{\rm CGGD}\widehat{\mathbf R}_{yy}$ reduces to the conventional empirical covariance and the solution coincides with the standard MPDR beamformer.

  • MLDR (p=0p=0):

For p=0p=0, λ^s1p/2=λ^s\widehat\lambda_s^{1-p/2} = \widehat\lambda_s, recovering the weighted MPDR (wMPDR) or MLDR formulation.

  • Narrowband, p\ell_p-norm minimization:

With a single narrowband snapshot, the CGGD-MLDR cost reduces to

minwlwHypsubject to  wHh=1,\min_{\mathbf w}\sum_l|\mathbf w^H\mathbf y|^p \quad \text{subject to}\; \mathbf w^H\mathbf h=1,

which is exactly the Minimum Dispersion Distortionless Response (MDDR) beamformer defined by p\ell_p-norm minimization. The CGGD prior thus statistically justifies the selection of pp.

5. Robustness Mechanisms and Theoretical Insights

The CGGD-MLDR beamformer achieves robustness by adaptively downweighting frames that are dominated by target speech in the covariance estimate. Under stationarity and partitioning frames into speech and noise-only segments, the covariance estimate is expressible as

CGGDRyy=L2λsp/2Υss+(L1ρ+L2λsp/21)Υvv,{}_{\rm CGGD}{\bf R}_{yy} = \mathcal L_2\,\lambda_s^{p/2} \mathbf\Upsilon_{ss} + (\mathcal L_1\,\rho + \mathcal L_2\,\lambda_s^{p/2-1})\mathbf\Upsilon_{vv},

where L2\mathcal L_2 (resp. L1\mathcal L_1) is the number of speech (resp. noise) frames, ρ=λv/δ1p/2\rho=\lambda_v/\delta^{1-p/2}, and δ\delta is a small floor parameter. The noise-to-speech mixing ratio,

rp=L1ρ+L2λsp/21L2λsp/2,r_p = \frac{\mathcal L_1\,\rho + \mathcal L_2\,\lambda_s^{p/2-1}}{\mathcal L_2\,\lambda_s^{p/2}},

obeys rpr2r_p \ge r_2 for λsδ\lambda_s \ge \delta, ensuring that the contribution of noise is at least as large as in the Gaussian case. Downweighting of high-energy speech frames mitigates target cancellation and increases robustness under low SINR and steering vector mismatches (Meng et al., 2021).

6. Experimental Evaluation and Empirical Results

Experiments employ 10 TIMIT speech signals, NOISEX-92 babble noise, six-element uniform linear arrays (4 cm spacing), and room simulations via the image method at various reverberation times (RT60{0,160,320,480,640}\text{RT}_{60}\in\{0,160,320,480,640\} ms), with the desired speaker at 0° and two interferences at ±45°, all located at 2 m. PESQ improvement is the primary metric.

Key findings include:

  • CGGD-MLDR, using a super-Gaussian prior with p=0.5p=0.5, consistently outperforms both standard MPDR (p=2p=2) and MLDR (p=0p=0) in PESQ improvement.
  • Convergence is rapid (2–3 iterations suffice to approach the oracle MVDR bound).
  • Robustness is notably enhanced under low input SINR (down to –5 dB) and moderate reverberation (RT60=160\text{RT}_{60}=160 ms). In these settings, MPDR may degrade perceptual quality (PESQ decrease), whereas CGGD-MLDR maintains positive PESQ improvement.

7. Significance and Applications

The CGGD-MLDR framework provides a rigorously derived, general-purpose approach for multichannel speech enhancement that leverages statistical speech priors to adaptively and robustly estimate spatial covariance. By employing a super-Gaussian CGGD prior, the method aggressively downweights high-energy speech frames in covariance estimation, resulting in improved beamforming robustness against reverberation, interfering signals, and model mismatch. This framework generalizes classical MPDR and MDDR approaches and establishes empirical benchmarks in terms of speech enhancement quality for microphone array signal processing (Meng et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Complex Generalized Gaussian Prior (CGGD-MLDR).