PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement

Published 18 Jun 2026 in cs.CV and cs.AI | (2606.19867v1)

Abstract: Computed Tomography (CT) is essential for diagnosing pediatric craniofacial abnormalities, yet poses radiation risks to developing anatomies. Reconstructing 3D CT from sparse bi-planar X-rays offers a low-dose alternative but is severely ill-posed. Existing methods employ geometry-agnostic feature lifting, naively projecting 2D features into 3D without explicit spatial modeling, causing depth ambiguity and degraded osseous boundaries. We present PSCT-Net, a geometry-aware framework with differentiable back-projection. Differentiable back-projection establishes a spatially faithful volumetric prior, alleviating depth ambiguity. An Attention-Guided Projection (AGP-3D) module then learns non-linear voxel-wise correspondences between 2D regions and 3D locations. A Bidirectional Mamba (BiM-3D) module captures long-range volumetric dependencies with linear complexity. We further curate a private institutional pediatric skull CT cohort, PedSkull-CT, comprising normal and pathological cases for internal evaluation, addressing the gap in adult-centric, trunk-focused datasets.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces PSCT-Net, a geometry-aware network leveraging differentiable back-projection and attention-guided refinement to reconstruct pediatric skull CTs from bi-planar X-rays.
It combines BP-C, MV3D-C, AGP-3D, and BiM-3D modules to ensure spatial consistency, mitigate depth ambiguities, and preserve fine anatomical details.
Superior quantitative metrics (PSNR, SSIM, LPIPS) and ablation studies validate PSCT-Net’s effectiveness for low-dose, high-resolution pediatric imaging.

Geometry-Aware Pediatric Skull CT Reconstruction with PSCT-Net

Introduction

Conventional computed tomography (CT) remains indispensable for diagnosing pediatric craniofacial pathologies by providing detailed anatomical views, yet its high ionizing radiation dose is particularly hazardous for children. Low-dose bi-planar X-rays are clinically favored, but lack the volumetric depth needed for comprehensive evaluation. The task of reconstructing 3D CT from sparse 2D X-rays is severely ill-posed, primarily due to depth ambiguities and loss of fine osseous details. Previous approaches, largely geometry-agnostic, project 2D features into 3D volumes without explicitly modeling spatial acquisition, resulting in spatial misalignment and degraded anatomical boundaries. Diffusion-based models further improve textural fidelity but are computationally prohibitive for real-time clinical deployment.

This paper introduces PSCT-Net, a geometry-aware framework integrating differentiable back-projection, attention-guided feature lifting, and efficient bidirectional state-space modeling. PSCT-Net directly incorporates acquisition geometry and refines volumetric priors non-linearly, facilitating precise X-ray to CT reconstruction suitable for pediatric applications.

Figure 1: Overview of PSCT-Net showing back-projection of biplanar X-rays to obtain a coarse volumetric prior, which is then refined to reconstruct a high-fidelity CT volume.

Methodology

Differentiable Back-Projection Volumetric Initialization

PSCT-Net begins by forming a spatially faithful volumetric prior through differentiable back-projection of frontal and lateral X-rays. This operation aligns 2D projection intensities along physical ray paths to produce an attenuation volume, directly encoding view geometry into the volumetric initialization. The mathematical formulation ensures spatial consistency and substantially mitigates depth ambiguity, overcoming a central limitation of earlier methods.

Figure 2: The framework initializes a volume with differentiable back-projection, and geometric conditioning is enforced via BP-C and MV3D-C modules for robust detail refinement.

Geometry-Aware Multi-View Conditioning

Fine anatomical structures demand strict spatial consistency across all network stages. To sustain geometric fidelity, PSCT-Net uses BP-C (Back-Projection Conditioning) at the encoder to inject geometry-aware 3D volumes, and MV3D-C (Multi-View 3D Conditioning) in the decoder for semantically aligning high-level features across different views. These dual points of conditioning ensure detail and structure preservation, even for complex pediatric anatomy.

Attention-Guided Projection (AGP-3D)

The AGP-3D module leverages multi-head attention to learn non-linear voxel-wise correspondences between 2D regions and 3D volume. By treating 3D voxels as queries and 2D feature maps as keys, AGP-3D adaptively aggregates discriminative features, replacing the rigid linear projections common in prior work. This mechanism is crucial for reconstructing patient-specific details and avoiding hallucinated structures.

Bidirectional Mamba (BiM-3D)

Global volumetric context is captured by BiM-3D, a bidirectional selective state-space model that scales linearly with input size, outperforming quadratic transformer attention. The BiM-3D module enables efficient large-scale context modeling without sacrificing spatial detail, a key requirement for reconstructing high-resolution pediatric CTs.

Figure 3: Module visualizations—(a) BP-C for encoder conditioning, (b) MV3D-C for decoder alignment, (c) AGP-3D for attention-guided 2D-to-3D mapping, (d) BiM-3D for bidirectional state-space refinement.

Training Objective

PSCT-Net is trained with a compound loss: adversarial (LSGAN), voxel-wise reconstruction ( $\ell_1$ ), and projection consistency. Balancing weights are empirically set to ensure geometric and texture fidelity. Adversarial loss utilizes a 3D PatchDiscriminator for robustness.

Dataset Construction and Evaluation Protocols

Existing public datasets focus exclusively on adult trunk anatomies, lacking the unique pediatric cranial attributes necessary for clinical relevance. To address this, the authors curated PedSkull-CT, a private cohort of 982 skull CTs from patients aged 1–24 months, with paired synthetic X-rays (DRR and CycleGAN-transferred). Public benchmarks—LIDC-IDRI, CTSpine1K, and CTPelvic1K—were used to demonstrate anatomical generalization.

Figure 4: Various styles of X-ray inputs and real-world CT reconstruction results, emphasizing preservation of patient-specific anatomy.

Results and Quantitative Analysis

PSCT-Net demonstrated consistently superior performance across all public datasets and the private PedSkull-CT cohort. Key metrics include PSNR, SSIM, and LPIPS. Notable strong numerical results include:

On LIDC-IDRI: PSCT-Net achieved 27.18 dB PSNR, surpassing the next-best (diffusion-based) by 0.83 dB. SSIM was 0.671, LPIPS was 0.102.
On CTPelvic1K: PSCT-Net achieved 33.06 dB PSNR, outperforming second-best by 1.35 dB.
On PedSkull-CT: PSCT-Net exceeded baselines with 31.49 dB PSNR, 0.882 SSIM, LPIPS 0.100.

Ablation studies confirmed that each proposed module incrementally improves baseline performance, with BiM-3D contributing the largest gain (+1.04 dB PSNR).

Qualitative evaluation indicated that PSCT-Net uniquely preserves fine structural details (e.g., cranial sutures, orbital depth) and avoids common hallucinations seen in geometry-agnostic models, supporting claims on robust anatomical generalization.

(Figure 5)

Figure 5: Qualitative comparison highlighting fine structural preservation by PSCT-Net and avoidance of anatomical hallucinations in reconstructed CT volumes.

Implications and Future Directions

The explicit geometric modeling in PSCT-Net fundamentally challenges the prevailing reliance on geometry-agnostic feature lifting for X-ray to CT reconstruction. The strong empirical gains reinforce the claim that geometry-aware priors and efficient context modeling are highly complementary and necessary for clinical-grade performance. The practical implication is that PSCT-Net offers a realistic path toward low-dose pediatric imaging, with rapid inference suitable for integration into clinical workflows.

Theoretically, the incorporation of differentiable back-projection and attention-guided refinement signals a shift toward more physically informed neural architectures in medical image reconstruction, with potential extensions to other inverse imaging problems.

Future research directions include patch-based refinement and application of implicit neural representations or 3D Gaussian primitives for sub-millimeter detail recovery. Clinical reader studies will be pursued to validate utility in craniosynostosis diagnosis.

Conclusion

PSCT-Net introduces a geometry-aware paradigm for reconstructing pediatric skull CT volumes from bi-planar X-rays, integrating explicit volumetric priors and advanced attention mechanisms for spatially consistent and computationally efficient reconstruction. Superior quantitative and qualitative results across diverse datasets underscore the necessity of geometric encoding. Future work will target ultra-fine detail recovery and clinical validation, further advancing AI-driven low-dose imaging in pediatric care.

Markdown Report Issue