RefoMed-EN: PET Enhancement Framework
- The paper demonstrates that RefoMed-EN significantly improves PET image quality metrics (CNR and SNR) by employing a mean teacher UNet architecture with patient-adaptive training.
- RefoMed-EN is built on a dual-path 2D UNet structure using EMA for the teacher update, ensuring consistency regularization without modifications to the core UNet backbone.
- Clinical results reveal that the framework enhances full-dose PET images and reconstructs low-dose scans with improved SSIM and PSNR, aiding accurate tumor delineation and motion monitoring.
RefoMed-EN is a patient-specific deep learning framework for positron emission tomography (PET) image enhancement and low-dose PET reconstruction, specifically designed for the RefleXion X1 biology-guided radiotherapy (BgRT) system. The framework builds on a mean teacher (MT) UNet architecture to address the challenges imposed by the dual 90-degree PET detector geometry of RefleXion X1, which inherently collects fewer events than a full-ring PET system and thus suffers reduced image quality, especially in the short, low-dose scans necessitated by BgRT clinical workflows. RefoMed-EN employs a patient-adaptive training protocol, leverages a consistency-based loss, and demonstrates significant improvements in quantitative image quality metrics and clinical utility in both full-dose enhancement and low-dose reconstruction contexts (Fu et al., 2022).
1. System Design of RefoMed-EN
RefoMed-EN is structurally based on the 2D UNet architecture (Ronneberger et al., 2015) and instantiated as a mean teacher/student paradigm:
- Student Path (S-UNet): A classic encoder-decoder network with skip connections, four down-sampling blocks (each with two convolutional layers + ReLU and max pooling), a bottleneck (two conv + ReLU), and four up-sampling blocks (each with a transpose-convolution plus two conv + ReLU). Feature channels double from 64 to 1024 in the encoder, halving symmetrically in the decoder.
- Teacher Path (T-UNet): Identical in topology to S-UNet. The teacher's parameters are updated from the student by exponential moving average (EMA): with after every optimization step.
No architectural modifications (e.g., attention gates, residual connections) alter the standard UNet backbone. Skip connections concatenate encoder and decoder feature maps at each level to preserve spatial information.
2. Training Protocol and Data Handling
2.1 Dataset and Preparation
- For each of cancer patients, a simulation PET/CT scan was performed on RefleXion X1.
- Each scan’s coincidence event list was used to reconstruct one full-dose (ground truth) PET image and four quarter-dose images (by evenly partitioning events).
- For patients, two additional treatment-fraction scans were acquired and partitioned similarly.
2.2 Preprocessing
- Omission of top and bottom 4 slices per volume to minimize extreme noise.
- Images in in-plane spatial size (4 mm pixels), 2.1 mm slice thickness.
- Body mask extracted from CT by Otsu thresholding followed by opening/closing; PET voxels normalized by mean value within mask and set to zero outside.
- No geometric augmentation or cropping was applied.
2.3 Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Optimizer | Adam () |
| Learning rate | (constant) |
| Batch size | 16 (2D slices) |
| Epochs | 1,200 |
| EMA decay () | 0.995 |
| Consistency-loss weight () | 0.05 |
| Model selection | Minimum overall training loss |
3. Loss Functions
3.1 Student Loss: Supervised and Consistency Terms
Given quarter-dose images and full-dose image :
where , and denote the student and teacher UNet outputs, and .
3.2 Teacher Update
Teacher weights updated via EMA with ; not trained by backpropagation.
4. Evaluation Protocol and Metrics
4.1 Image Enhancement and Reconstruction
- Full-dose enhancement: Trained models are applied in a single pass to simulation scan full-dose PET images, yielding enhanced images compared against the originals.
- Low-dose reconstruction: For each quarter-dose treatment slice, the trained student UNet predicts the corresponding full-dose image.
4.2 Quantitative Metrics
- Contrast-to-Noise Ratio (CNR):
- Signal-to-Noise Ratio (SNR):
- Structural Similarity Index (SSIM):
- Peak Signal-to-Noise Ratio (PSNR):
Statistical significance between methods evaluated by Wilcoxon signed-rank test.
| Task | Method | CNR | SNR | SSIM | PSNR (dB) |
|---|---|---|---|---|---|
| Full-dose enhancement | Original | — | — | ||
| UNet-enhanced | — | — | |||
| RefoMed-EN (MT) | — | — | |||
| Low-dose reconstruction | Input (1/4 dose) | — | — | ||
| UNet | — | — | |||
| RefoMed-EN (MT) | — | — | ** | ** |
(** vs. UNet prediction)
5. Results and Clinical Implications
5.1 Percentage Improvements
- Full-dose enhancement (vs. original): CNR increased by 28.7%, SNR by 25.3%.
- Low-dose reconstruction (vs. quarter-dose input): SSIM improved by 3.1%; PSNR increased by 6.2 dB (approximately 20% improvement on the linear SNR scale).
5.2 Impact on BgRT Clinical Workflow
Enhancement of full-dose PET improves CNR and SNR, which can facilitate more accurate tumor delineation before radiotherapy delivery. Efficient quarter-dose PET reconstruction at high SSIM/PSNR in near real-time may support intra-fraction guidance and patient motion monitoring. Notably, the patient-specific retraining paradigm enables per-patient model adaptation from a single simulation scan without reliance on large population datasets or hardware modification, which is significant for on-treatment verification and individualized treatment planning (Fu et al., 2022).
6. Discussion and Future Directions
The use of a mean teacher consistency paradigm enables effective regularization in small, patient-specific datasets without necessitating augmentation or multi-patient training. The interpolation of full-dose PET features from noisy, quarter-dose acquisitions demonstrates that this framework is well-matched to highly constrained BgRT regimes. A plausible implication is that similar MT-UNet or consistency-regularized models could be extended to other modalities or PET systems where acquisition constraints lead to under-sampled data.
Further research may address robustness across anatomical sites, complex tumor types, and incorporate additional contextual priors or advanced architectural modules, if proven effective beyond the baseline UNet. The rapid retraining and application pipeline of RefoMed-EN offers potential for clinical translation wherever individualized image enhancement or reconstruction is needed without redesigning PET instrumentation.
7. Summary Table: Core Technical Elements
| Component | Details | Significance |
|---|---|---|
| Architecture | 2D UNet with mean teacher/student EMA update () | Consistency regularization, simple backbone |
| Training regime | Patient-specific, data from a single simulation scan | Model adapts to individual anatomy |
| Primary metrics | CNR, SNR, SSIM, PSNR | Standard PET quality benchmarks |
| Quant. outcomes | CNR/SNR (full-dose): +28.7%/+25.3%; SSIM/PSNR (low-dose): +3.1%/+6.2 dB | Substantial image quality enhancement |
| Clinical impact | Improved tumor delineation and online motion monitoring | Increases BgRT precision |
RefoMed-EN represents a technical synthesis of consistency-based deep learning and patient-adaptive workflows for PET enhancement and reconstruction, providing a quantifiable advance for BgRT on the RefleXion X1 platform (Fu et al., 2022).