D-PerceptCT: CT Image Quality Optimization
- D-PerceptCT is a computed tomography framework that fuses semantic and local features using a dual-path architecture to enhance image quality.
- It employs HVS-based perceptual loss functions to accurately preserve diagnostically relevant anatomical details in low-dose and sparse-view scenarios.
- Innovations include combining CNN and transformer models for image enhancement, denoising, and automated anatomical mapping, achieving significant improvements in PSNR, SSIM, and clinical interpretability.
D-PerceptCT denotes a family of approaches and system architectures in computed tomography (CT) focused on optimizing image quality, noise/artifact suppression, and perceptual fidelity in radiological tasks—especially in low-dose, sparse-view, and quality assessment settings. These frameworks leverage deep learning, compressive sensing, human visual system (HVS) modeling, and domain-specific losses to preserve diagnostically relevant structures and details that are often lost in conventional pipelines. Innovations under the D-PerceptCT paradigm span enhancement, denoising, sparse-view reconstruction, automated vessel mapping, and perceptual image quality evaluation.
1. Core Models and Architectures
The latest D-PerceptCT method (Nabila et al., 18 Nov 2025) employs a dual-path architecture comprising a Visual Dual-Path Extractor (ViDex) and a Deep Visual State-Space Model (DV2SM). The ViDex module fuses semantic priors from a frozen DINOv2 transformer (Semantic Feature Extractor Branch, SFEB) with local features from a lightweight CNN (Local Detail Extractor Branch, LDEB). A feature-fusion module aggregates these streams, producing a fused representation that combines dose-invariant semantic context with local structural details. The DV2SM then processes these fused features through stacked Visual State-Space Groups (VSSGs), each containing multiple Global-Local State-Space Blocks (GL2SB). The GL2SB balances long-range spatial dependencies (modeled by learnable state-space convolutions) with multiscale vision blocks (providing local and mid-range contextual sensitivity via parallel convolutions of different sizes).
The output head reduces the final feature stack to a single-channel enhanced CT slice. This composite design aims to maximize retention and enhancement of anatomically and pathologically meaningful patterns in low-dose images.
Earlier instantiations utilized different backbones: an 8-layer feed-forward convolutional network with perceptual loss from VGG16 (Yang et al., 2017); dual-stage U-Nets for sparse-view sinogram superresolution and artifact refinement, with discrimination-perceptual losses derived from domain-adaptive GAN discriminators (Wei et al., 2020); and U-Net–style denoising diffusion probabilistic models (DDPM) for primary content inference in quality assessment settings (Shi et al., 2023).
2. Loss Functions and Perceptual Modeling
The defining element of the D-PerceptCT methodology is the prominence of perceptual loss functions that explicitly model or approximate the HVS’s sensitivity to features crucial for radiological interpretation.
The Deep Perceptual Relevancy Loss Function (DPRLF) (Nabila et al., 18 Nov 2025) is structured as a weighted combination of VGG16 feature reconstructions at low, mid, and high levels, calibrated to match the HVS’s contrast sensitivity function (CSF):
with weights () empirically chosen to reflect highest HVS sensitivity at mid spatial frequencies.
Previous perceptual losses included:
- VGG feature L2 distances at multiple layers (Yang et al., 2017).
- Discriminator perceptual (DP) losses stemming from the activations of GAN discriminators trained directly on CT data, providing domain-adapted feature metrics (Wei et al., 2020).
- Dissimilarity maps and attention modules inspired by Internal Generative Mechanism theory, utilizing DDPM-predicted primary image content to guide transformer-based regression of diagnostic quality (Shi et al., 2023).
Collectively, these loss designs mitigate the over-smoothing and detail-suppression inherent to mean-square error (MSE) based training, directly penalizing failures to reconstruct features critical to radiologist perception.
3. Technical Innovations Across Modalities
D-PerceptCT frameworks exist across three main CT modalities:
A. Low-Dose CT Enhancement and Denoising
D-PerceptCT architectures consistently outperform conventional CNN denoisers (e.g., REDCNN, WGAN), transformer-based models (CTFormer), and diffusion models (DenoMamba) on both full-reference and no-reference perceptual metrics (LPIPS, ST-LPIPS, DISTS, PIQE).
| Method | PSNR | SSIM | LPIPS (↓) | API Score (↑) |
|---|---|---|---|---|
| REDCNN | 40.04 | 0.9158 | 0.0597 | 19 |
| DenoMamba | 43.82 | 0.9809 | 0.0888 | 38 |
| D-PerceptCT | 42.97 | 0.9867 | 0.0104 | 46 (1st) |
Perceptual loss variants in earlier work demonstrated enhanced recovery of subtle nodules and vessel structures, with visible suppression of noise without structural blurring (Yang et al., 2017).
B. Sparse-View Reconstruction and Sinogram Domain Inpainting
The D-PerceptCT two-step strategy for sparse-view CT (Wei et al., 2020) separates measurement domain superresolution (via a UNet-based Super-Resolution Information Network, SIN) from image-domain correction (Perceptual Refinement Network, PRN). The DP loss, based on features of a CT-trained discriminator, outperforms VGG-based losses by approximately 0.7 dB PSNR. This structure yields ≈4 dB PSNR and ≈0.04 SSIM improvement over competitive unrolled, adversarial, and TV-based optimization baselines.
| Model | PSNR [dB] | SSIM |
|---|---|---|
| SIN only | 34.19 | 0.859 |
| SIN→4-chan-PRN (ours) | 34.90 | 0.877 |
C. Automated Anatomical Pathways Analysis
D-PerceptCT’s algorithm for DIEAP mapping in CTA leverages tailored centerline extraction (gradient tracking with ridge correction in subcutaneous fat) and minimum-cost A* graph searches, combining multi-scale Frangi filtering and intensity sigmoid costs (Araújo et al., 2019). The reported mean Euclidean errors are 0.64 mm (subcutaneous) and 0.50 mm (intramuscular), both within the subvoxel regime, and typical per-patient runtime is ≈2 minutes including operator review.
4. Training Procedures and Datasets
The primary dataset for LDCT enhancement is the Mayo 2016 Low-Dose CT Grand Challenge (2,378 slice pairs, simulated quarter-dose, 8 patient train/val split, 2 patient test) (Nabila et al., 18 Nov 2025). Standard splits reserve 20% of slices per training patient for validation; training occurs over ≈47 epochs (45,000 iters) on paired data. Optimizers are Adam (lr=1e-4), β₁=0.9, β₂=0.99, batch size 2.
Sparse-view work uses the TCIA LDCT-projection set: 5,394 slices from 68 patients, with test sets of ≈500 held-out slices (Wei et al., 2020). In D-PerceptCT for dieap mapping, 21 CTA subjects (Philips Brilliance 16, 0.55–0.98 mm in-plane, 0.4–1.5 mm slice) support evaluation (Araújo et al., 2019).
5. Evaluation Metrics and Results
In image quality assessment (IQA), D-PerceptCT utilizing a DDPM-Transformer pipeline achieved the highest correlation with radiologist scores in the MICCAI 2023 LDCT BIQA Challenge (PLCC=0.9814, SROCC=0.9816, KROCC=0.9122, Overall=2.8753), outperforming transformer baselines (e.g., MANIQA, PLCC=0.9789) (Shi et al., 2023).
For enhancement, D-PerceptCT consistently reports lower LPIPS and ST-LPIPS (e.g., 0.0104 and 0.0018) and higher API scores compared to existing denoisers (e.g., REDCNN, WGAN, CTFormer, DenoMamba) (Nabila et al., 18 Nov 2025).
Qualitative evaluations on Mayo data illustrate that D-PerceptCT uniquely preserves sharp boundaries (e.g., vessels, tumor margins) and soft-tissue textures while suppressing noise, validated by lower error in residual maps and visual assessment of ROIs.
6. Practical Considerations and Clinical Impact
D-PerceptCT designs reflect explicit attention to clinical interpretability. Enhancement strategies are tailored to retain signal in diagnostically significant spatial frequencies, with objective metrics and, in select cases, reader studies supporting perceptual gains. The architectural fusion of semantic, multiscale, and long-range spatial context is motivated by HVS literature and radiological diagnosis priorities.
Adoption of D-PerceptCT paradigms is projected to:
- Increase confidence in lesion/tumor/vessel boundary detection under extreme dose reduction.
- Enable radiologist-adjustable tradeoff between noise suppression and detail preservation in real time.
- Integrate with clinical PACS systems pending benchmarking of inference speeds and memory footprint.
- Accelerate surgical planning and reduce manual workload in vascular pathway annotation (Araújo et al., 2019).
Identified limitations include reliance on simulated dose reductions, lack of formal reader studies in some settings, and need for further optimization to support ultra-low-dose or real-time applications. Extension opportunities exist for task-adaptive variants, online perceptual feedback integration, and full 3D volumetric adaptation.
7. Extensibility and Future Directions
Generalization to other imaging settings (e.g., limited-angle CT, artifact correction, under-sampled MRI) is plausible, where a known forward operator allows for domain-specific loss construction and measurement-domain priors. Lightweight and real-time variants, 3D extensions, and online radiologist-in-the-loop learning represent active areas for future development (Nabila et al., 18 Nov 2025, Wei et al., 2020). Developing end-to-end, perceptually optimized CT pipelines remains an open challenge, with D-PerceptCT providing foundational strategies for integrating HVS-based priors, transformer and state-space architectures, and radiologically-relevant evaluation metrics.