Residual Neural Likelihood Estimation (RNLE)
- RNLE is a simulation-based inference methodology that models the conditional noise likelihood (residuals) directly for deterministic signal-plus-noise systems.
- It employs a Masked Autoregressive Flow architecture with multiple autoregressive transforms to accurately capture non-Gaussian noise characteristics and glitches.
- The method achieves simulation efficiency and unbiased parameter inference in challenging environments, outperforming traditional neural likelihood estimation approaches.
Residual Neural Likelihood Estimation (RNLE) is a simulation-based inference methodology that leverages neural density estimation to directly learn the likelihood distribution of noise (residuals) conditioned on model parameters, rather than the full data likelihood. Designed to exploit additive signal-plus-noise models common in scientific data, RNLE enables efficient and robust parameter inference—especially in contexts where noise is non-Gaussian or contaminated with realistic artifacts. It is particularly relevant for gravitational-wave astronomy but applies broadly to any domain with deterministically modeled signals embedded in complex, non-stationary noise (Emma et al., 20 Jan 2026).
1. Mathematical Foundations
Let denote the observed data (e.g., gravitational-wave detector strain time series), and let summarize the deterministic signal parameters. RNLE assumes an additive data generative model,
where is the deterministic signal model and is the realization of the (potentially non-Gaussian) noise.
The full data likelihood , frequently intractable for realistic noise, is recast via the change of variable as
Thus, the task is shifted to learning the conditional likelihood of the residuals . RNLE approximates this distribution with a neural density estimator , where represents the network weights.
Model parameters are obtained by minimizing the KL divergence between the true and model noise distributions:
which is equivalent to maximizing the expected log-likelihood of noise realizations under . Empirically, for a training set , the loss function is
2. Model Architecture and Training Procedure
RNLE uses a Masked Autoregressive Flow (MAF) neural density estimator, as implemented in the sbi package. The residual likelihood is constructed by composing autoregressive (invertible) transforms:
where each transform is parameterized by an ARNN (e.g., MADE) that conditions on and the preceding elements of . The paper adopts an architecture of five flows (), with each ARNN having two hidden layers of 50 units and ReLU activations.
Training is based solely on pure-noise residuals , optionally paired with corresponding parameter values (used, for instance, to encode PSD or whitening information with real detector data). Crucially, no signal is injected during training. For real data, 1 s segments of whitened detector data are used, introducing a scale parameter during whitening. Non-Gaussian noise artifacts (glitches) are incorporated by sampling corresponding data segments during training, which allows the model to learn heavy-tailed residual distributions.
Optimization employs Adam with a default learning rate of , batch size , and training durations of to steps, managed by the sbi toolkit (Emma et al., 20 Jan 2026).
At inference, the workflow for evaluating under the learned model is:
- Generate for the proposed .
- Whiten and subtract from observed , yielding residual .
- Evaluate for likelihood computation.
3. Advantages over Standard Neural Likelihood Estimation (NLE)
RNLE offers several key improvements over traditional NLE approaches:
- Dimensionality Reduction: Standard NLE seeks to learn across the simulator's full output dimension, which includes both signal and noise. RNLE only models , typically a lower-dimensional space—especially when may parameterize only the noise PSD. This results in reduced simulation requirements for accurate likelihood learning.
- Non-Gaussian Robustness: By modeling the true (possibly glitch-dominated) noise distribution empirically, RNLE circumvents the limitations of the Whittle likelihood, which assumes Gaussianity and stationarity. RNLE accurately learns even heavy-tailed noise structures directly from data, yielding robustness to real-world non-Gaussian artifacts.
- Empirical Efficiency and Calibration: In a sine-Gaussian toy model, RNLE attains Jensen-Shannon divergence with simulations, while standard NLE requires an order of magnitude more. For 10D and 15D binary black hole (BBH) injections, RNLE achieves JS divergences with . In tests with strong glitches, RNLE posteriors remain unbiased (true parameter contained in 90% credible interval), in stark contrast to biased Whittle likelihood inferences (Emma et al., 20 Jan 2026).
4. Performance Evaluation and Applications
Performance of RNLE is validated across multiple domains:
- Toy Models: For sine-Gaussian signals with parameters plus scale noise, RNLE achieves convergence in both 2D and 4D parameter spaces with simulations, measured via JS divergence of posterior samples.
- Simulated Gravitational-Wave (GW) Signals: In 1D chirp-mass benchmarks, RNLE/Whittle JS divergence is for ; posterior probability (PP) test p-values are (indicative of correct coverage). High-dimensional BBH cases—10D, 11D, and 15D—exhibit JS for all intrinsic parameters and slightly higher JS for extrinsic angles, within expected sampling variability.
- Real Detector Noise: For quasi-Gaussian 1 s segments from LHO (Aug 2019), RNLE and Whittle posteriors on injected BBH signals yield JS0.01 with calibration passed. Under loud glitches (SNR ), RNLE recovers true chirp mass within the 90% credible interval (zero offset condition); Whittle likelihood suffers catastrophic bias. With blip glitches (O2, SNR ), RNLE—trained on matched-glitch datasets—produces unbiased posteriors irrespective of offset, unlike BayesWave deglitching which fails near glitch time (Emma et al., 20 Jan 2026).
- Ensemble Weighting: When multiple independently trained models are available, mixture posteriors can be formed:
Here, is the measured evidence for the -th model. While evidence-based weights are intuitive, in glitch-dominated regimes evidence may sometimes favor a biased realization. Nevertheless, variability in evidence acts as a sensitive indicator of density estimator convergence.
5. Implementation and Software
RNLE is implemented in the sbilby package, which extends the Bilby inference library. The basic workflow is:
1 2 3 4 5 6 |
pip install sbilby from sbilby import RNLELikelihood, RNLEConfig config = RNLEConfig(noise_model=..., waveform_model=...) lnL = RNLELikelihood(config) result = bilby.run_sampler(likelihood=lnL, priors=..., sampler='dynesty') |
All data preprocessing steps—whitening, windowing, PSD estimation—are handled internally to match Bilby’s Whittle likelihood conventions. The implementation is designed for seamless integration in pipeline-driven scientific inference and supports rapid deployment for gravitational-wave astronomy (Emma et al., 20 Jan 2026).
6. Broader Applicability and Prospects
RNLE’s methodology is directly transferable to a range of domains characterized by additive deterministic signals and complex, possibly non-Gaussian noise. Representative application areas include:
- Radio or X-ray pulsar timing under non-stationary terrestrial or astrophysical transients
- Seismic signal analysis with environmental noise artifacts
- Biomedical time series (e.g., ECG/EEG with movement artifacts)
- Particle physics scenarios with rare signal peaks plus detector backgrounds
By decoupling signal modeling from noise likelihood estimation, and directly learning the true residual noise from data, RNLE offers robust parameter inference under real-world noise violations—particularly when noise structure is not amenable to analytic modeling or stationary approximations. This approach permits orders-of-magnitude improvements in simulation efficiency while maintaining the rigor of Bayesian inference. In summary, RNLE establishes a principled, scalable, and empirically validated framework for simulation-based likelihood estimation where noise realism and robustness are essential (Emma et al., 20 Jan 2026).