Beurling-LASSO (BLASSO): Sparse Recovery Framework

Updated 17 September 2025

BLASSO is a convex optimization framework for continuous-domain sparse recovery that extends ℓ¹-regularization to Radon measures, promoting spike signal recovery.
It employs dual certificate construction and geometric separation guarantees, leveraging metrics like the Fisher-Rao distance to ensure exact support recovery.
BLASSO underpins practical applications in super-resolution, mixture estimation, and inverse problems by offering rigorous error bounds and localization guarantees.

Beurling-LASSO (BLASSO) is a convex optimization framework for continuous-domain sparse recovery, extending classical ℓ¹-regularized estimators to the infinite-dimensional setting of Radon measures. BLASSO has become a cornerstone of modern super-resolution, mixture estimation, and off-the-grid sparse inverse problems by providing grid-free support recovery, quantitative performance guarantees, and a theoretical foundation for sparsity-inducing regularization in spaces beyond finite-dimensional vector models.

1. Formulation and Theoretical Foundations

The archetypal BLASSO estimator solves the following optimization problem over the space of signed or complex-valued finite Radon measures $\mathcal{M}(X)$ on a domain $X$ : $\min_{\mu \in \mathcal{M}(X)} \ \frac{1}{2} \|y - F \mu\|_\mathcal{F}^2 + \kappa \|\mu\|_{\mathrm{TV}}, \tag{1}$ where $y \in \mathcal{F}$ is an observed signal (typically in a Hilbert space), $F: \mathcal{M}(X) \to \mathcal{F}$ is a known linear measurement operator, $\kappa>0$ is a regularization parameter, and $\|\mu\|_{\mathrm{TV}}$ denotes the total variation norm of the measure, which generalizes the ℓ¹-norm to the space of measures. This objective promotes concentration of $\mu$ onto finitely many atoms, recovering sparse “spike” signals or parameter mixtures directly in continuous space without discretization artifacts.

The total variation norm is defined as: $\|\mu\|_{\mathrm{TV}} = \sup \{ \langle f, \mu \rangle \mid f \in C_0(X), \|f\|_\infty \leq 1 \},$ for $C_0(X)$ the continuous functions vanishing at infinity.

A defining feature of BLASSO is that, under moderate conditions (including measurement nondegeneracy and a minimal separation between spikes as measured in a problem-adapted metric), minimizers are sparse—concentrated on a finite sum of Dirac masses, i.e., $\mu^\star = \sum_{j=1}^s a_j \delta_{x_j}$ .

2. Geometry, Separation, and Support Recovery

Accurate support recovery by BLASSO requires a nondegenerate solution structure, governed by geometric separation in the parameter space. Classical on-grid approaches rely on an a priori discretization, inducing basis mismatch and resolution limits. In contrast, BLASSO exploits the geometry via a problem-adapted distance.

For translation-invariant setups, Euclidean separation suffices. In more general settings (e.g., Laplace inversion, Gaussian mixtures with unknown variance), the Fisher-Rao geodesic distance induced by the kernel or Fisher information is employed. Denote the kernel associated with $F$ as $K(x, x') = \langle F \delta_x, F \delta_{x'} \rangle_\mathcal{F}$ ; the Fisher metric is $\Gamma_x = \nabla_x \nabla_{x'} K(x, x')|_{x=x'}$ and the geodesic distance is: $d_\Gamma(x, x') = \inf_\gamma \int_0^1 \sqrt{\dot\gamma(t)^\top \Gamma_{\gamma(t)} \dot\gamma(t)}\,dt,$ with $\gamma$ any smooth path between $x$ and $x'$ (Poon et al., 2018).

The optimality and stability of support recovery hinge on the existence of so-called dual certificates. These are functions $\eta$ defined (in the simplest case) as $\eta = F^* p$ for some $p$ in the data space, interpolating the sign pattern at true atoms, remaining strictly subunit elsewhere, and satisfying stationarity at the support: $\begin{cases} \eta(x_j) = \operatorname{sign}(a_j), \ |\eta(x)| < 1, \quad x \notin \{x_j\}, \ \nabla \eta(x_j) = 0. \end{cases}$ The separation condition, typically in the Fisher-Rao metric, ensures the invertibility of local interpolation systems and nondegeneracy of the dual certificate, thus guaranteeing uniqueness and stability (Poon et al., 2018, Giard et al., 16 Sep 2025).

Exact Sparse Representation Recovery (ESRR) in Banach space settings is established under a Metric Non-Degenerate Source Condition (MNDSC), which generalizes classical source and localization conditions to arbitrary geometries and regularizers (Carioni et al., 14 Jun 2024).

3. Kernel Structure, Dual Certificates, and the Kernel Switch

The ability to construct dual certificates—and consequently obtain error and localization bounds—depends on local properties of the kernel $K$ . The crucial property is the Local Positive Curvature (LPC) assumption: within small neighborhoods around each true spike location, $x \mapsto K(x_j,x)$ must be sufficiently strongly concave/convex.

Prior work identified a limited set of kernels admitting LPC, such as the Jackson and Gaussian kernels. The “kernel switch” principle allows transferring LPC properties from a “pivot” kernel $K_{\mathrm{pivot}}$ to an actual model kernel $K_{\mathrm{mod}}$ provided the Reproducing Kernel Hilbert Space (RKHS) embedding is continuous, i.e., there exists $C_{\mathrm{switch}} < \infty$ such that

$\| \eta \|_{H_{\mathrm{mod}}} \leq C_{\mathrm{switch}} \| \eta \|_{H_{\mathrm{pivot}}}$

for all $\eta \in H_{\mathrm{pivot}}$ (Castro et al., 11 Jul 2025). This device expands the class of models for which BLASSO guarantees are available.

The sinc-4 kernel, defined by $K(s,t) = \mathrm{sinc}^4((t-s)/4)$ (coordinate-wise in $\R^d$ ), is a notable new LPC kernel, enabling sharp recovery guarantees for translation-invariant mixture models.

4. Statistical and Localization Error Guarantees

BLASSO achieves quantitative error and localization bounds for both estimation and prediction tasks. If $y = F \mu^0 + \Gamma$ for a sparse $\mu^0$ ( $s_0$ atoms, minimum separation $\Delta_0$ ) and noise of norm $\gamma$ , then for a minimizer $\mu$ :

The total variation outside balls of radius $r$ (“far region”): $|\mu|(\mathrm{Far}) \leq C (\gamma/(\epsilon_2 r^2)) \sqrt{s_0}$
The deviation near each support point: $| \mu( N_k(r) ) - a_k^0| \leq C (\gamma/(\epsilon_2 r^2)) \sqrt{s_0} + C' \gamma$
Any region carrying more than $C'' \gamma \sqrt{s_0}$ mass is within radius $r$ of some true atom

Here $\epsilon_2$ is an LPC parameter (e.g., $\epsilon_2 \geq 23/128$ for sinc-4 kernel). These bounds demonstrate that the localization error decreases as the noise level $\gamma$ drops, yielding “effective near regions” around true spikes (Castro et al., 11 Jul 2025, Giard et al., 16 Sep 2025).

For problems involving random sketching (e.g., random Fourier features), corresponding “sketched” BLASSO estimators obey nearly identical error rates, provided the embedding constants and kernel tail bounds are controlled.

Selection of the regularization parameter $\kappa \sim \gamma/\sqrt{s_0}$ is crucial, and guarantees are established to hold for any $\kappa$ in an admissible range (“tuning insensitivity”) (Castro et al., 11 Jul 2025).

5. Numerical Methods and Algorithmic Strategies

Solving BLASSO poses nontrivial computational challenges owing to the infinite-dimensional measure space. Three principal approaches have been developed:

Finite-grid discretization (basis pursuit) yields standard convex ℓ¹ problems but reintroduces grid artifacts and potentially overestimates the degrees of freedom.
Sliding-Frank-Wolfe and greedy “particle” methods iteratively add or refine Dirac atoms, with local optimization (e.g., BFGS) for atom positions (Poon et al., 2018).
Dual and proximal gradient approaches avoid explicit parameterization by solving in a dual functional setting, leveraging Fenchel–Rockafellar duality and Moreau decomposition to facilitate updates in Hilbert space (Schulze et al., 2022).

For convolutional source separation, the dual proximal method eliminates direct manipulation of measures, instead updating residuals in the observation space via iterative schemes subject to dual constraints.

“Smooth bilevel programming” introduces a change of variables exploiting quadratic variational representations of the TV norm, recasting BLASSO into a smooth (but nonconvex) bi-level problem amenable to quasi-Newton methods such as BFGS. Despite nonconvexity, there are no spurious local minima and all saddle points can be efficiently navigated (Poon et al., 2021).

Randomized sketching—compressing data via random features—yields computationally tractable BLASSO surrogates that retain localization guarantees under appropriate conditions (Castro et al., 11 Jul 2025).

6. Applications in Super-Resolution, Mixture Models, and Inverse Problems

BLASSO is central in super-resolution imaging, where the objective is to recover point sources below the nominal resolution dictated by band-limited measurements (e.g., line spectra from partial Fourier data). Under a Fisher-Rao separation exceeding a threshold, BLASSO achieves exact recovery and minimax-optimal localization, often with sample complexity linear (or nearly linear) in the sparsity.

In Gaussian Mixture Model (GMM) estimation with unknown diagonal covariances, BLASSO enables simultaneous estimation of the number of components, means, variances, and weights. Using an appropriate convex objective, non-asymptotic recovery rates approaching parametric limits for component parameters and density prediction are established. The analysis uses a novel kernel-induced semidistance adapted to unknown variances and leverages construction of local dual certificates with explicit separation bounds (Giard et al., 16 Sep 2025).

Signal demixing and group sparsity models (Group BLASSO) are addressed by extending the theory of ESRR to spaces of vector measures and structured atom sets under the MNDSC, yielding exact recovery guarantees in noise-limited regimes (Carioni et al., 14 Jun 2024).

7. Degrees of Freedom, Risk Estimation, and Theoretical Insights

A distinguishing feature of BLASSO is a refined understanding of prediction degrees of freedom (DOF). Whereas discretized LASSO counts a coefficient per nonzero atom (and thus overestimates effective complexity), BLASSO’s DOF is strictly smaller, controlled by the sensitivity of the estimator’s spike positions and amplitudes: $\operatorname{div}(\mu^\star)(y) = \operatorname{tr}( \Gamma_{\bar{x}} M^{-1} \Gamma_{\bar{x}}^\top ),$ where $\Gamma_{\bar{x}}$ encodes both measurements and their Jacobians at the atom locations, and $M$ aggregates curvature and data-fit terms (Poon et al., 2019).

This explicit expression enables unbiased risk estimation via Stein’s Unbiased Risk Estimator (SURE): $\mathrm{SURE}(\hat\mu(y)) = -n\sigma^2 + \|y - \hat\mu(y)\|_2^2 + 2\sigma^2 \, \operatorname{div}(\hat\mu)(y).$ Thus, practitioners can perform principled selection of regularization parameters and obtain tighter confidence intervals for super-resolved recovery.

8. Limitations and Outlook

BLASSO’s theoretical and practical impact is tempered by certain limitations. Construction and verification of dual certificates require nontrivial geometric control (e.g., minimal separation), and effective computation on large-scale or high-dimensional domains can be resource-intensive—especially for SDP or greedy refinement. Sample complexity and recovery rates deteriorate if signal atoms are closely spaced, noise is high, or model mismatch occurs. Regularization parameter selection, while principled in theory, still demands careful cross-validation or empirical tuning, especially in challenging regimes.

Future directions include:

Development of faster algorithms for large-scale BLASSO with provable guarantees,
Extension to broader classes of kernels and non-translation-invariant operators,
Deeper integration of sketching and randomized features for scalability,
Robustification to model uncertainties and non-i.i.d. noise.

BLASSO thus remains a focal point for research in continuous sparse regularization, uniting statistical optimality, geometric control, and algorithmic innovation across inverse problems, imaging, and mixture modeling.