Prior-AttUNet: OCT Fluid Segmentation
- The paper introduces a dual-path architecture that combines an IntroVAE-based normative prior pathway with a U-Net segmentation backbone to enhance boundary delineation.
- It employs a novel triple-attention mechanism that fuses multi-scale encoder, decoder, and prior features to improve segmentation accuracy and cross-device robustness.
- Experimental results on the RETOUCH benchmark demonstrate superior mDSC performance and reliable fluid segmentation, outperforming prior architectures like DAA-UNet.
Prior-AttUNet is a retinal optical coherence tomography (OCT) fluid segmentation architecture that integrates normative anatomical priors within a dual-path attention-gated design. The model specifically targets challenges in delineating ambiguous fluid boundaries and achieving robust cross-device generalization for critical pathologies such as macular edema. Prior-AttUNet introduces a generative normal anatomical pathway via a variational autoencoder and a U-Net–style segmentation network, fusing their multi-scale representations through a novel triple-attention mechanism.
1. Architectural Foundations
Prior-AttUNet adopts a hybrid dual-path architecture (Fig. 2 in (Yang et al., 25 Dec 2025)):
- Generative Prior Pathway (“NORMNET”): Implements an IntroVAE trained exclusively on fluid-free OCT images. Given , the encoder %%%%1%%%% produces mean and log variance, from which latent is sampled. The decoder reconstructs , a high-quality normative OCT. Multi-scale feature extraction from yields a set via a 4-stage encoder, aligning spatially through a symmetric decoder.
- Segmentation Backbone: An encoder–decoder topology inspired by U-Net, incorporating:
- DenseDepthSepBlocks at each level: L=3 depthwise-separable convolutions with dense connectivity and 32 channels per layer (see Eq. (2) in (Yang et al., 25 Dec 2025)).
- Atrous Spatial Pyramid Pooling (ASPP) bottleneck: Four parallel atrous convolutions with dilation rates and a global max-pooling branch. Outputs are concatenated, compressed, and regularized (Fig. 4, Eq. (3–6)).
- Triple Attention Gates replacing standard skip connections, fusing encoder, decoder, and prior features (see §3.4, Fig. 6).
The decoder reconstructs the segmentation mask through upsampling, skip-attention fusion, concatenation, and final dense block processing, outputting via a sigmoid-activated convolution.
2. Generative Anatomical Priors: IntroVAE
The normal anatomical prior branch is implemented as an IntroVAE:
- Encoder: Learns , outputting and (Eq. (7)).
- Latent Sampling: (Eq. (8)).
- Decoder: Reconstructs (Eq. (9)).
- Prior: Assumed .
- ELBO loss: (Eq. (12′)).
Multi-scale prior features are extracted using four-stage encoding (stride-2 convolutions) and symmetric decoding, spatially aligned to match encoder–decoder skip stages. These priors are supplied to the segmentation network as guidance in attention fusion.
3. Segmentation Pathway and Losses
The segmentation backbone, instantiated as a U-Net–style encoder–decoder, incorporates the following:
- DenseDepthSepBlock: Each block contains depthwise-separable convolutions with dense inter-layer concatenation (Fig. 3, Eq. (1), (2)). Each layer adds 32 channels, facilitating feature reuse with minimal computational overhead.
- ASPP: Four atrous convolutions and a global max-pooling path (Fig. 4, Eq. (3–6)) capture multi-scale cues critical for fluid segmentation across imaging devices.
- Segmentation Loss Functions:
- Dice Loss:
(Eq. (17)), where and are prediction and ground-truth at voxel . - Lovász Loss: Surrogate for mean Intersection-over-Union (mIoU) across classes, (Eq. (18)). - Total Loss: .
4. Triple-Attention Mechanism
Prior-AttUNet’s skip fusion is governed by a triple-attention gate per decoder stage:
Inputs: Encoder skip , decoder state , and anatomical prior .
Linear Projection: , , (Eq. (12)).
Fusion and Attention:
- (Eq. (13))
- Attention map (Eq. (14a))
- Decoder features modulated: (Eq. (14b))
This implements spatial, channel, and prior-guided attention, amplifying contrast at fluid–tissue borders and improving lesion delineation, especially at ambiguous boundaries. Subtraction and fusion in this mechanism are visualized in attention heatmaps (see Fig. 8 in (Yang et al., 25 Dec 2025)).
5. Training Protocols
- VAE Training: Conducted on normal OCT slices; minimized over 150 epochs. Best weights retained based on reconstruction fidelity. Learned weights are frozen before segmentation training.
- Segmentation Training: Performed on all annotated slices (input , batch=16, AdamW with , , , ), for 150 epochs. No heavy augmentation reported.
- Total Parameters: 47.04M; Compute: 0.37 TFLOPS at inference.
6. Experimental Evaluation
Evaluation on the public RETOUCH benchmark demonstrates:
| Device | mDSC (%) | Precision/Recall | SD (mDSC) |
|---|---|---|---|
| Cirrus | 93.93 | – | ±1.6 |
| Spectralis | 95.18 | – | ±0.3 |
| Topcon | 93.47 | – | ±0.3 |
- Outperforms DAA-UNet by ≈3 percentage points in mDSC at significantly lower FLOPs.
- Ablations (Spectralis, Table 3):
- Removal of normal prior: mDSC drops by 2.38%
- Removal of triple attention: mDSC drops by 1.11%
- Removal of ASPP: mDSC drops by 1.12%
- Removal of dense blocks: mDSC drops by 1.88%
- Further ablations (Tables 4–7): IntroVAE priors are irreplaceable; alternative extractors or DS blocks underperform or are less efficient.
- Inference: Real-time on RTX 4090.
7. Context, Robustness, and Significance
- Boundary Delineation: The triple attention gate accentuates differential features between pathological and normative anatomy, critical in challenging fluid-vs-tissue transitions.
- Cross-Device Robustness: Consistent mDSC and low inter-device variance across Cirrus, Spectralis, and Topcon are observed, verifying generalizability to intensity and structural differences common in multi-vendor clinical OCT.
- Clinical Relevance: Achieves accuracy and efficiency suitable for integration into automated diagnostic pipelines.
- Comparative Significance: By integrating multi-scale anatomical priors through novel attention fusion, Prior-AttUNet advances performance and efficiency on the RETOUCH OCT segmentation benchmark, providing a robust and generalizable solution for retina fluid analysis (Yang et al., 25 Dec 2025).