Quasi-Bayesian sequential deconvolution (2408.14402v2)

Published 26 Aug 2024 in stat.ME and stat.ML

Abstract: Density deconvolution deals with the estimation of the probability density function $f$ of a random signal from $n\geq1$ data observed with independent and known additive random noise. This is a classical problem in statistics, for which frequentist and Bayesian nonparametric approaches are available to estimate $f$ in static or batch domains. In this paper, we consider the problem of density deconvolution in a streaming or online domain, and develop a principled sequential approach to estimate $f$. By relying on a quasi-Bayesian sequential (learning) model for the data, often referred to as Newton's algorithm, we obtain a sequential deconvolution estimate $f_{n}$ of $f$ that is of easy evaluation, computationally efficient, and with constant computational cost as data increase, which is desirable for streaming data. In particular, local and uniform Gaussian central limit theorems for $f_{n}$ are established, leading to asymptotic credible intervals and bands for $f$, respectively. We provide the sequential deconvolution estimate $f_{n}$ with large sample asymptotic guarantees under the quasi-Bayesian sequential model for the data, proving a merging with respect to the direct density estimation problem, and also under a ``true" frequentist model for the data, proving consistency. An empirical validation of our methods is presented on synthetic and real data, also comparing with respect to a kernel approach and a Bayesian nonparametric approach with a Dirichlet process mixture prior.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a quasi-Bayesian algorithm for deconvolution that scales efficiently with streaming data via constant computational updates.
It leverages a recursive update rule with an adaptive learning rate to ensure almost sure convergence through stochastic approximation theory.
Synthetic and real data validations confirm competitive performance, achieving both local and uniform Gaussian asymptotic properties.

A quasi-Bayesian sequential approach to deconvolution density estimation

This paper addresses the problem of density deconvolution in a streaming context where data are subject to noise and arrive progressively with no predetermined sample size. The focus is on developing a sequential nonparametric method to estimate the probability density function of a random signal based on noisy observations.

Density deconvolution is crucial when dealing with data contaminated by measurement errors, blur, or other distortions. Fields such as medicine, econometrics, bioinformatics, and astronomy often encounter such data. Traditional approaches for deconvolution have included kernel-based methods, wavelets, and iterative methods, with the frequentist optimal minimax rates of convergence well-studied. In the Bayesian literature, recent advancements have addressed posterior consistency and contraction rates for nonparametric priors like the Dirichlet process mixture.

Contributions

The authors introduce a quasi-Bayesian sequential approach, often referred to as Newton's algorithm, for the deconvolution problem in streaming data settings. This method provides:

Efficiency and Scalability: The algorithm is computationally efficient, with evaluations that remain constant in computational cost as data increases.
Asymptotic Properties: Rigorous large sample asymptotic properties are provided for the proposed estimates, including local and uniform central limit theorems.
Practical Applications: The approach is validated through synthetic and real data examples, and comparisons with kernel-based and Bayesian nonparametric methods show competitive performance across various noise distributions.

Methodology

The authors assume that the true density function $f_X$ of the signal is a finite mixture of known kernels $k(\cdot \mid \theta)$ , parameterized by $\theta$ with an unknown mixing density $g$ . The quasi-Bayesian sequential algorithm starts with an initial guess $\tilde{g}_0$ for the mixing density and updates it recursively as new data arrive. Explicitly, the update rule for $n \geq 1$ is given by: $\tilde{g}_{n+1}(\theta) = (1 - \tilde{\alpha}_{n+1}) \tilde{g}_n(\theta) + \tilde{\alpha}_{n+1} \tilde{g}_n(\theta \mid Y_{n+1}),$ where $\tilde{g}_n(\theta \mid Y_{n+1})$ denotes the updated density based on the new observation $Y_{n+1}$ .

An essential part of the methodology is the learning rate $\tilde{\alpha}_{n+1}$ , which ensures the balance between past and present data. The convergence properties of the sequence $\tilde{g}_n$ are derived using stochastic approximation theory, demonstrating that the estimates converge almost surely to the true mixing density under certain regularity conditions.

Results

The large sample asymptotic properties of the algorithm include:

Local Central Limit Theorem: The estimates at a point converge to a Gaussian distribution, enabling the construction of asymptotic credible intervals.
Uniform Central Limit Theorem: The estimates over an interval converge to a Gaussian process, allowing for the formation of asymptotic credible bands.

Numerical illustrations were performed on both synthetic data and the Shapley galaxy data, comparing the new approach against traditional methods. The results showed that the quasi-Bayesian algorithm accurately recovers the underlying signal densities even in the presence of significant noise, validating its theoretical properties.

Implications and Future Work

The paper's findings have several important implications:

Practical Scalability: The algorithm's ability to handle streaming data efficiently makes it highly suitable for modern applications involving large datasets.
Theoretical Contribution: The detailed asymptotic analysis enriches the literature on density deconvolution, particularly in the context of streaming data.

Future work could extend the proposed methodology to even more complex models, including multivariate mixtures and dependent data structures. Further exploration into adaptive learning rates that can dynamically adjust based on noise characteristics and data structure might yield even more robust performance.

In summary, this paper provides a substantial contribution to the field of density deconvolution, offering a theoretically sound, computationally efficient method tailored to the challenges of streaming data.

PDF Markdown