Higher-Order Non-Linear Beamforming

Updated 19 November 2025

Higher-order non-linear beamforming is a set of methods that utilize non-linear combinations of delayed sensor signals to harness higher-order statistics for improved noise suppression and signal enhancement.
These techniques extend traditional delay-and-sum methods by employing polynomial correlations and neural network architectures, leading to sharper resolution and better artifact rejection.
Efficient implementations via closed-form polynomial expansions and GPU parallelism enable real-time applications, yielding significant gains in SNR, dynamic range, and imaging contrast.

Higher-order non-linear beamforming refers to spatial signal processing techniques that incorporate products or other non-linear combinations of delayed sensor signals across orders greater than two, in order to exploit higher-order statistical dependencies for improved noise/artifact suppression and signal enhancement. These methods surpass traditional linear approaches—such as delay-and-sum—as well as basic second-order non-linear methods like delay-multiply-and-sum (DMAS), by systematically harnessing higher-order interactions among sensor channels in array signal processing tasks. This class includes both polynomial-correlation beamformers and neural architectures designed to realize, approximate, or extend such non-linear filtering.

1. Mathematical Principles and Non-Linear Extensions

Linear beamforming methods, typified by delay-and-sum (DAS), assume perfect coherence across delayed channels, summing the delayed observations $s_i(t-\tau_i)$ as $S_\mathrm{DAS}(t) = \sum_{i=1}^N s_i(t-\tau_i)$ . While computationally efficient ( $O(N)$ per beam/pixel), this approach suffers from broad main lobes, high sidelobes, and low contrast/resolution in noisy or reverberant environments due to its inability to discriminate signal from incoherent noise or clutter (Mulani et al., 2022, Jansen et al., 12 Nov 2025).

Second-order DMAS improves upon these limitations by introducing pairwise signal products, evaluating $S_\mathrm{DMAS} = \sum_{i<j} \operatorname{sign}(s_i s_j) \sqrt{|s_i s_j|}$ . This pairwise multiplication acts as a simple coherence detector, reinforcing true source directions while attenuating incoherent/noise-like signals, but incurs a quadratic $O(N^2)$ complexity.

The general $k$ -th order DMAS beamformer extends this by summing over all $k$ -tuples of delayed signals:

$S_k = \sum_{1 \le i_1 < \dots < i_k \le N} \operatorname{sign}\bigg(\prod_{\ell=1}^k s_{i_\ell}\bigg) \bigg|\prod_{\ell=1}^k s_{i_\ell}\bigg|^{1/k}$

with $k=3,4,5$ yielding progressively stronger suppression of off-axis incoherent artifacts while further distinguishing coherent signal arrivals (Mulani et al., 2022, Jansen et al., 12 Nov 2025).

In the domain of multichannel speech enhancement, higher-order non-linear beamformers also emerge from an MMSE-optimal Bayesian perspective when the noise is modeled as non-Gaussian, e.g., a Gaussian mixture, resulting in estimators that are fundamentally non-linear and jointly exploit spatial and spectral statistics (Tesch et al., 2021). Neural approximators such as TaylorBeamformer employ high-order nonlinear transformations, recursively derived from Taylor expansion terms, where each higher-order component serves as a data-driven residual canceller complementing the 0th-order spatial filter (Li et al., 2022).

2. Efficient Implementation via Closed-Form and Neural Methods

Direct computation of $k$ -th order DMAS is combinatorially expensive ( $O(N^k)$ for $N$ sensors and order $k$ ). Closed-form polynomial expansions, derived using Newton–Girard identities, enable efficient $O(N)$ computation for all practical $k$ (typically $k \leq 5$ ), reducing higher-order sums to a small number of sums and products of vector powers. For example, third-order DMAS is implemented as:

$S_3 = \tfrac{1}{6}\left[(\sum_i r_i)^3 + 2\sum_i s_i - 3(\sum_i r_i)(\sum_i r_i^2)\right]$

with $r_i = s_i^{1/3}$ (Mulani et al., 2022, Jansen et al., 12 Nov 2025).

Real-time deployment is achieved by parallelizing these vector calculations across array pixels or time-frequency bins on GPU architectures, with memory traffic and root/sign operations managed to sustain high throughput (e.g., 23 frames per second for images with $2048 \times 2048$ pixels on commodity GPUs for $k=3$ ) (Mulani et al., 2022). Embedded GPU platforms support real-time in-air acoustic imaging with similar techniques, using a per-pixel CUDA thread model (Jansen et al., 12 Nov 2025).

End-to-end neural architectures, such as TaylorBeamformer, replace explicit higher-order analytic terms with learnable neural modules for each derivative order. These networks (e.g., stacks of S-TCN blocks) are trained with loss functions balancing spatial and spectral reconstruction, and achieve competitive inference cost ( $\sim$ 8 Giga MAC/s on 6 mics, 7.25M parameters for $Q=5$ ) (Li et al., 2022).

3. Quantitative Performance in Imaging and Speech Applications

Systematic evaluations have demonstrated that increasing the correlation order $k$ yields monotonic improvements in contrast, SNR, and artifact suppression, up to an empirically optimal value (usually $k=5$ ). In photoacoustic imaging, progressing from DAS to DMAS-5 led to FWHM reductions (from $\sim$ 3.2 mm to $\sim$ 1.6 mm) and SNR improvements ( $+25$ dB versus DAS, $+17$ dB versus DMAS) (Mulani et al., 2022). In in-air acoustic imaging, dynamic range increased from $\sim$ 30 dB (DAS) to $\sim$ 80 dB (DMAS-5), with SNR and contrast rising accordingly (Jansen et al., 12 Nov 2025).

In multichannel speech enhancement, analytic higher-order non-linear filters outperformed classical and two-stage linear beamformers, particularly in heavy-tailed (kurtotic) or multi-interferer environments. The non-linear joint MMSE filter, or its neural approximation, delivered SI-SDR gains up to $\sim$ 4.5 dB and perceptual speech quality (POLQA) advantages over linear approaches in both simulated and real environments (Tesch et al., 2021). Neural higher-order methods (TaylorBeamformer, $Q=5$ ) outperformed frame-wise oracle MVDR baselines by $+0.4$ PESQ, $+9$ ESTOI points, and $+2.3$ dB SI-SDR in causal 6-microphone speech enhancement settings (Li et al., 2022).

Method/Order	FWHM (mm)	SNR Gain (dB, vs. DAS)	Dynamic Range (dB)
DAS (Linear, $k=1$ )	$\sim$ 3.2	0	$\sim$ 30
DMAS ( $k=2$ )	$\sim$ 1.9	$+8$ –$9$	$\sim$ 50
DMAS-5 ( $k=5$ )	$\sim$ 1.6	$+25$	$\sim$ 80

4. Comparison with Classical Linear and Hybrid Approaches

Classical linear approaches, such as MVDR or multichannel Wiener filtering, are optimal only under Gaussian noise due to the sufficiency of second-order statistics. Two-step cascades (linear spatial filter plus postfilter) are suboptimal for non-Gaussian fields because they cannot fully exploit higher-order spatial or spectral dependencies.

Higher-order non-linear methods, whether analytic or learned, can (a) suppress more than $D-1$ directional interferers for $D$ array elements, and (b) adapt to non-stationary or heavy-tailed noise via higher-order moment exploitation. This effect is particularly pronounced in heavy-tailed (super-Gaussian) or mixture-based noise environments (Tesch et al., 2021). Linear cascades lose spatial detail by collapsing mixture components, whereas non-linear spatial filtering leverages individual component covariances, realizing higher spatial selectivity.

Neural architectures inspired by Taylor expansion (e.g., TaylorBeamformer) generalize this principle, where each additional order corresponds to a data-driven non-linear correction that further reduces residual noise or reverberation, with performance saturating after $Q=5$ (Li et al., 2022).

5. Practical Implementation Strategies

For efficient deployment of higher-order non-linear beamformers:

Use closed-form polynomial expansions for analytic DMAS to reduce computation from $O(N^k)$ to $O(N)$ per beam/pixel (Mulani et al., 2022, Jansen et al., 12 Nov 2025).
Exploit GPU parallelism by allocating one computation thread per output pixel/angle, with delayed signal lookups and vectorized operations.
Coherence Factor (CF) weighting can further clean up side lobes and residual artifacts, being especially useful in reverberant scenes (Jansen et al., 12 Nov 2025).
For neural methods, stack modular networks corresponding to higher-order residual terms and supervise both spatial and spectral outputs for optimal training convergence (Li et al., 2022).
In both signal processing and neural contexts, orders beyond $5$ may yield diminishing returns or signal saturation/distortion, so system parameters are typically tuned for $k, Q \leq 5$ .

6. Application Domains and Limitations

Higher-order non-linear beamforming is applicable to:

Photoacoustic and ultrasonic imaging, where image quality and artifact rejection are critical, and coherent signal peaks require maximal reinforcement (Mulani et al., 2022).
In-air acoustic imaging and real-time sonar/ultrasound applications, with practical deployment on embedded GPU processors for industrial, autonomous robotic, and medical imaging scenarios (Jansen et al., 12 Nov 2025).
Multichannel speech enhancement, especially in environments characterized by non-Gaussian, diffuse or distributed interferers, or where outlier robustness is required (Tesch et al., 2021, Li et al., 2022).

Key limitations include increased memory bandwidth and per-pixel compute (root/sign calculation), slight sensitivity to calibration errors, and signal peak saturation for orders above five (Mulani et al., 2022). Analytic construction in high-dimensional arrays may become intractable; neural approximators mitigate this cost at the expense of requiring extensive labeled data and careful model selection.

7. Extensions and Future Research Directions

For large arrays or time-varying environments:

Fit more flexible non-Gaussian mixture models or non-parametric noise models for Bayesian filters (Tesch et al., 2021).
Use attention or recurrent architectures to capture time-varying spatial statistics.
Generalize analytic higher-order beamforming to arbitrary heavy-tailed (e.g., α-stable) noise through learned nonlinearities (Tesch et al., 2021).
Combine higher-order analytic and neural approaches: hand-crafted closed-forms as initialization or regularization for trainable systems (Li et al., 2022).

A plausible implication is that as embedded computing capabilities expand, advanced non-linear beamformers—both analytic and learned—will become increasingly prevalent in real-time, resource-constrained deployment for robust imaging, sensing, and speech enhancement.