Robust Online Overdetermined Independent Vector Analysis Based on Bilinear Decomposition

Published 18 Jan 2026 in eess.AS and cs.SD | (2601.12485v1)

Abstract: Online blind source separation is essential for both speech communication and human-machine interaction. Among existing approaches, overdetermined independent vector analysis (OverIVA) delivers strong performance by exploiting the statistical independence of source signals and the orthogonality between source and noise subspaces. However, when applied to large microphone arrays, the number of parameters grows rapidly, which can degrade online estimation accuracy. To overcome this challenge, we propose decomposing each long separation filter into a bilinear form of two shorter filters, thereby reducing the number of parameters. Because the two filters are closely coupled, we design an alternating iterative projection algorithm to update them in turn. Simulation results show that, with far fewer parameters, the proposed method achieves improved performance and robustness.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper demonstrates the novel BiIVA method which factorizes online overdetermined IVA filters via bilinear decomposition to drastically reduce parameters.
The approach uses alternating iterative projection to optimize short sub-filters, achieving superior SIR and SDR improvements over AuxIVA and OverIVA.
The method enables efficient real-time blind source separation in large microphone arrays, promising deployment in IoT and robotic auditory interfaces.

Robust Online Overdetermined Independent Vector Analysis via Bilinear Decomposition

Problem Statement and Background

Online Blind Source Separation (BSS) in overdetermined acoustic scenarios is essential for speech communication and human–machine interaction, particularly when the number of microphones $M$ exceeds the number of sound sources $N$ . While classical Independent Vector Analysis (IVA) is widely adopted for BSS due to its leveraging of source statistical independence, its applicability is limited by the determined mixing assumption ( $M = N$ ). Overdetermined IVA (OverIVA) exploits an orthogonality constraint to suppress noise and better utilize microphone array spatial diversity, delivering improved performance over AuxIVA in multi-microphone conditions.

The parameterization in OverIVA, however, scales poorly with increasing array size, resulting in biased statistical estimates and prohibitive memory and computational costs for online, low-latency deployments. The paper introduces a method to mitigate parameter explosion by employing bilinear decomposition on the demixing filters, which reduces dimensionality while maintaining the algorithm’s robustness and separation capacity.

Bilinear Decomposition in OverIVA

The fundamental contribution is the representation of each source separation filter $\mathbf{w}_{n,j}$ as a Kronecker product of two shorter vectors:

$\mathbf{w}_{n,j} = \mathbf{w}_{n,j,1} \otimes \mathbf{w}_{n,j,2}$

where $\mathbf{w}_{n,j,1} \in \mathbb{C}^{M_1}$ and $\mathbf{w}_{n,j,2} \in \mathbb{C}^{M_2}$ , with $M = M_1 M_2$ . This leads to a reduction in the number of parameters from $M$ per filter to $M_1 + M_2$ (with $M_1 M_2 \gg M_1 + M_2$ for practical $M$ ), resulting in increased robustness and sample efficiency, particularly when $M$ is large.

To overcome strong coupling between the sub-filters, an alternating iterative projection algorithm is proposed, successively optimizing $\mathbf{w}_{n,j,1}$ and $\mathbf{w}_{n,j,2}$ via modified auxiliary-function-based objectives. This is an extension of prior bilinear optimization approaches, now generalized to the multi-source, fully online OverIVA setting.

Algorithmic Structure

BiIVA, the resulting method, follows the typical iterative structure of online frequency-domain BSS but replaces conventional filter update procedures with the derived bilinear updates for each sub-filter, alternating optimization steps per sub-filter, and normalization for stability. The orthogonality constraint for noise separation matrices is preserved and updated using the projected spatial covariance structure. The complexity of sub-filter estimation is dominated by short-dimensional updates, which are computationally superior to high-dimensional full-filter estimation in standard OverIVA.

Numerical Results

Extensive simulations are conducted on synthetic two-source mixtures (CMU_Arctic, real RIRs), with a 6x6 microphone grid and strong spatial diversity. BiIVA is compared to AuxIVA and OverIVA under challenging reverberant and noisy conditions.

The results demonstrate pronounced improvements:

Convergence Performance: BiIVA attains an SIR peak exceeding 30 dB and an SDR peak near 20 dB, while OverIVA achieves around 20 dB SIR and 10 dB SDR, and AuxIVA saturates at 14 dB SIR and 8 dB SDR. The convergence rate for BiIVA is on par with OverIVA (Figure 1).

Figure 1: SIR and SDR improvement curves for AuxIVA, OverIVA, and BiIVA; BiIVA ultimately exhibits the highest separation and interference suppression performance.

Spectrotemporal Analysis: Spectrograms indicate that BiIVA yields minimal distortion of the target signal and superior suppression of interference compared to both baseline methods. Notably OverIVA occasionally introduces distortion artifacts, and AuxIVA displays incomplete interference removal (Figure 2).

Figure 2: Comparison of the observed signal, clean target, and outputs from AuxIVA, OverIVA, and BiIVA; BiIVA offers the most effective separation and least distortion.

Theoretical and Practical Implications

Bilinear decomposition of separation filters in the online OverIVA framework introduces a new class of efficient structured BSS models. The drastic parameter reduction not only boosts robustness and generalization (by mitigating overfitting and estimation variance in high-dimensional filter spaces) but also admits substantial computational reduction, making OverIVA practical for real-time embedded and edge applications with large microphone arrays.

From a theoretical standpoint, the alternation strategy to resolve the non-convex coupled estimation of Kronecker-factorized filters generalizes to other structured BSS and array processing algorithms. Additionally, BiIVA bridges statistical independence-based separation (IVA family) with model-driven parameter reduction paradigms common in modern array signal processing, suggesting new hybrid avenues.

Future Directions

This work suggests multiple directions:

Extended Decompositions: More sophisticated decompositions (e.g., higher-order tensor, block Kronecker) can further exploit structure in large arrays or in jointly multichannel-multiframe settings.
Adaptive Rank Selection: Dynamic adjustment of $M_1$ , $M_2$ is relevant for nonstationary array conditions or changing scene geometry.
Integration with Masking and Deep Priors: Unification of BiIVA with deep speech and noise models or soft frequency-domain masks could synergistically improve separation in highly adverse environments.
Scalability and Deployment: BiIVA’s efficiency makes it suitable for deployment in microphone-dense IoT and collaborative robotic auditory interfaces.

Conclusion

The paper presents the BiIVA algorithm, which factorizes online OverIVA separation filters via bilinear Kronecker decomposition and leverages alternating iterative projection for robust parameter estimation. This architecture yields significant improvements in SIR and SDR over competing AuxIVA and OverIVA baselines, especially in large microphone array regimes, as substantiated by numerical evidence. The methodology establishes a new standard for scalable, robust, real-time BSS in overdetermined environments, with strong prospects for extension and integration in future AI-powered acoustic scene understanding systems.

(2601.12485)

Markdown Report Issue