AstroSURE: Learning to Remove Noise from Astronomical Images Without Ground Truth Data

Published 18 Apr 2026 in astro-ph.IM and cs.CV | (2604.16793v1)

Abstract: In astronomical imaging, the low photon count of exposures necessitates extensive post-processing steps, including contamination removal and denoising. This paper evaluates deep-learning denoising methods that can be trained without clean ground-truth images and assesses their utility for detection11 oriented analysis of astronomical data. We adapt and compare Noise2Noise, Stein's Unbiased Risk Estimator, and blind-spot-based methods using synthetic data and real observations from the Hubble Space Telescope (HST) and the Canada-France-Hawaii Telescope (CFHT). Performance is evaluated using object-detection metrics, including correct detection rate and false alarm rate, together with image-based metrics and pixel-distribution diagnostics. The results show that these methods can improve faint-source detectability relative to the original noisy images, with encouraging gains on HST data after domain-consistent initialization, while transfer to CFHT data is more limited, highlighting the importance of instrument/domain similarity for unsupervised adaptation.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents unsupervised deep learning methods, including SURE and Noise2Noise, to effectively remove noise from astronomical images without ground truth data.
It rigorously compares architectures such as a modified U-Net and transformer-based models, demonstrating significant improvements in PSNR and object detection metrics.
The study establishes a practical benchmark for denoising in astronomy and highlights the necessity for domain-specific adaptation, especially for ground-based observations.

AstroSURE: Unsupervised Denoising of Astronomical Images without Ground Truth

Introduction

The paper "AstroSURE: Learning to Remove Noise from Astronomical Images Without Ground Truth Data" (2604.16793) presents an extensive analysis of unsupervised and self-supervised deep learning methods for denoising astronomical imagery—a context where clean, noise-free targets are innately unavailable. The work systematically adapts and evaluates Noise2Noise (N2N), Stein’s Unbiased Risk Estimator (SURE), and blind-spot-based methods, with a principal focus on object detection efficacy in both synthetic and real datasets. Through rigorous comparative experiments on simulated Hubble-like space observations and ground-based exposures from the Hubble Space Telescope (HST) and the Canada-France-Hawaii Telescope (CFHT), the paper provides a detection-oriented benchmark for denoising pipelines under realistic astronomical constraints.

Astronomical Imaging and Noise Characteristics

Astronomical imaging is dominated by low-photon statistics, diverse and complex noise sources, and large dynamic ranges in signal intensity. Key noise sources include Poisson-distributed photon shot noise, Gaussian readout noise, dark current, quantization, and localized artifacts (e.g., cosmic rays, satellite trails, defective pixels). These characteristics generate highly non-Gaussian, skewed pixel intensity distributions with a dominant background and rare, diverse objects near or below the noise floor.

Figure 1: MegaPrime exposure illustrating the wide dynamic range, the distribution of faint sources, and the non-Gaussian pixel intensity histogram distinguishing readout noise and photon statistics regimes.

Traditional denoising (e.g., Gaussian smoothing, BM3D) struggles to optimally preserve the morphologies of faint sources and is unable to exploit higher-order statistics without sacrificing object detectability.

Methodological Framework

Data and Noise Simulation

The study relies both on synthetic data generated using the physically informed Galsim toolkit and carefully pre-processed real survey observations from HST and CFHT. Synthetic datasets allow ground-truth assessment, while real data tests transferability and practical utility.

A simplified but physically plausible mixed Poisson-Gaussian noise model is utilized for most DL denoising experiments, with more realistic and detailed modeling deployed to evaluate robustness and domain adaptation. The study also delineates how local and global contaminants impact algorithmic choices.

Figure 3: Overview diagram of the GalSim-based noise synthesis model, capturing the sequential introduction of Poisson, Gaussian, and instrument-specific contamination.

Denoising Objectives and Training Paradigms

The paper contrasts supervised (Noise2Clean), weakly supervised (Noise2Noise), and fully unsupervised (SURE, blind-spot) loss formulations, noting the impracticality of "clean reference" targets in operational astronomy:

Noise2Noise: Exploits independent noisy pairs, requiring excellent alignment.
SURE: Implements unbiased empirical risk estimation under known noise statistics, requiring only single noisy instances.
Blind-Spot/Noise2Void/Noise2Self: Employ context-based prediction to avoid trivial identity mapping.

Empirically, Monte Carlo approximations of loss divergences (especially for SURE under complex noise) are implemented for tractable optimization with deep networks.

Network Architectures

Several architectures are evaluated, including DnCNN, various U-Net variants, and transformer-based Restormer. The modified U-Net (with tuned channel allocation and upsampling modifications) is found to offer the optimal balance between denoising efficacy, computational tractability, and generalization power.

Figure 4: Detailed structure of the modified U-Net, the principal backbone for the full suite of denoising experiments.

Experimental Analysis

Architecture Selection

Supervised training on synthetic data demonstrates that while transformer-based methods provide strong signal metrics, the modified U-Net achieves highly competitive MAE, MSE, and PSNR at dramatically reduced computational cost. $L_1$ -based training consistently outperforms $L_2$ for denoising astronomical structures with sharp intensity gradients.

Training Schemes: Quantitative and Qualitative Results

Noise2Noise and SURE approaches both yield significant denoising performance gains over baseline noisy frames in all metrics relevant to detection and signal reconstruction, with SURE enabling nearly equivalent performance to N2N without paired data.

Figure 5: Side-by-side PSNR results for reconstruction tasks demonstrate that both N2N and SURE denoisers substantially surpass classical and zero-shot approaches.

Figure 6: PSNR curves during training reveal near-identical learning dynamics for Noise2Clean and Noise2Noise; SURE converges more slowly but achieves comparable final validation scores.

The SURE-trained models stably improve SNR, PSNR, and detection rates, with marginally inferior absolute MAE compared to N2N, but within operational tolerance for survey-class source finding.

Qualitative error maps show residual bias is primarily localized around the brightest or most complex sources, reflecting the irreducible uncertainty of reconstructing barely detected objects in the absence of supervising information.

Figure 7: Error maps highlight the principal challenges in precisely reconstructing complex, high SNR objects; faint structure recovery is systematically improved by N2N/SURE relative to direct observations.

Detection-centric Evaluation

Detection analysis, central to the study, demonstrates that denoised images yield:

Substantial increases in correct detection rates (CDR) at fixed false alarm thresholds.
At a fixed CDR, the N2N/SURE-denoised images dramatically suppress spurious detections versus raw images.

Threshold sweeps on the background RMS parameter in SExtractor decisively favor denoised frames for faint-source completeness at controlled false alarm rates.

Figure 8: ROC analysis quantifies that the denoised outputs enable detection regimes with both higher completeness and lower contamination rates than any threshold achievable on raw frames.

Figure 2: Visual assessment of detections (blue: true, red: false, pink: missed) reinforces the quantitative results—faint source recovery is enhanced, and spurious noise-induced detections are suppressed.

Real-World Observational Data

The transfer of the AstroSURE workflow (with domain-matched pretraining) to HST data delivers measurable improvements in both unsupervised PSNR and detection metrics. However, adaptation to ground-based CFHT observations is less effective due to pronounced domain differences (e.g., atmospheric seeing, higher backgrounds), emphasizing the crucial importance of domain-consistent initialization.

(Table below summarizes results)

Dataset	State	uPSNR (dB)	NIQE	CDR (%)	FAR (%)
HST	Noisy Input	43.01	22.7	31.12	66.76
HST	Denoised (Ours)	50.72	16.6	35.72	65.49
CFHT	Noisy Input	40.14	21.9	20.29	44.74
CFHT	Denoised (Ours)	40.48	16.6	20.80	46.24

Strong claim: For space-based data, domain-matched unsupervised adaptation is sufficient for improved detection at constant or reduced contamination rates, providing operational value for faint-object surveys. For ground-based data with unmatched characteristics, further domain-specific modeling is required.

Theoretical and Practical Implications

The findings validate that self-supervised and unsupervised deep learning denoisers can achieve detection-oriented improvements in astronomical datasets where clean targets are fundamentally unavailable. The methods generalize sufficiently well when the pretraining domain exhibits statistics congruent with operational data; otherwise, domain shift undermines gains. The study effectively establishes a robust benchmark for single-frame denoising in astronomy, obviating the need for resource-intensive stacking in faint-source applications and setting a foundation for DL-enabled optimized survey pipelines.

From a theoretical standpoint, the results imply SURE-based unsupervised optimization can act as a near-optimal estimator for object-detection tasks, provided noise statistics are well-modeled and the architecture is appropriately tuned for astronomical morphology and noise structure.

Limitations and Future Directions

Notable limitations include:

Limited effectiveness on ground-based data (domain gap).
Non-addressed local/structured noise, e.g., cosmic rays, for which robust inpainting and context-aware modeling will be required.
Diminished performance with blind-spot/Noise2Self/Noise2Same approaches due to the sparse, high-contrast nature of astronomical fields leading to insufficient learning signal.

The paper suggests that future work will focus on expanding the approach to broader domain adaptation, integrating robust artifacts masking, and leveraging more advanced architectural priors (e.g., vision transformers) for structured contaminant removal.

Conclusion

AstroSURE provides a rigorously validated, practical unsupervised denoising framework for astronomical imaging, supported by strong quantitative improvements in faint-source detection on both synthetic and operational space-based exposures. Its modular methodology (Noise2Noise or SURE for single-instance data, modified U-Net backbone) sets a scalable foundation for future developments in automated, ground-truth-free image restoration across astronomical domains. The extensibility to instrument-specific adaptation and integration with robust artifact inpainting are essential future directions for survey astronomy and high-throughput data pipelines.

Figure 9: Comparative convergence behavior of training losses for the primary denoising pipelines, reinforcing the stability and efficacy of the Noise2Noise and SURE optimization frameworks.

Markdown Report Issue