2000 character limit reached

CPCR: Cross-Pyramid Consistency for Segmentation

Updated 17 November 2025

CPCR is a semi-supervised framework that improves medical image segmentation by enforcing cross-pyramid consistency across dual decoders.
It employs a dual-branch pyramid network with distinct upsampling methods and scale-specific perturbations to generate multi-scale auxiliary predictions.
CPCR integrates multiple loss terms including Dice loss, KL-divergence consistency, and entropy minimization, yielding competitive DSC, IoU, and enhanced boundary accuracy.

Cross-Pyramid Consistency Regularization (CPCR) is a semi-supervised learning framework designed to improve medical image segmentation accuracy by leveraging unlabeled data through a hybrid consistency-based approach applied to a dual-branch pyramid network. CPCR operates by enforcing cross-decoder and cross-scale knowledge distillation between two slightly different decoders, each producing multiscale auxiliary predictions with targeted feature perturbations, and integrates established consistency and uncertainty minimization objectives. This strategy effectively regularizes deep hierarchical features and demonstrates competitive performance with recent state-of-the-art methods while using a compact model architecture.

1. Dual Branch Pyramid Network (DBPNet) Architecture

DBPNet serves as the backbone for CPCR, implementing a four-level U-Net-style encoder and two structurally distinct decoders. The encoder consists of four sequential blocks of two 3×3 convolutions with ReLU activations, followed by 2×2 max-pooling steps, yielding progressively downsampled feature maps.

Each decoder (TR-branch and UP-branch) mirrors the encoder's structure but differs in its upsampling mechanism:

TR-branch: Employs transpose-convolution with learned kernels.
UP-branch: Utilizes 1×1 convolution followed by bilinear upsampling.

Skip connections link encoder blocks to their corresponding decoder levels. At the first three decoder levels (scales $l = 1,2,3$ ), each branch produces an auxiliary prediction through a 3×3 convolution, upsampling to input resolution, and softmax activation. A scale-specific perturbation module (dropout, feature-dropout, or Gaussian noise) is applied to features before the auxiliary head, with perturbations differing across both scale and branch. The fourth decoder level outputs the main prediction without auxiliary heads. Thus, each branch $d \in \{\text{TR}, \text{UP}\}$ provides a pyramid of probability maps $P_d^{(l)} \in [0,1]^{H \times W \times C}$ for $l = 1,2,3,4$ .

2. Cross-Pyramid Consistency Regularization: Mathematical Foundations

The CPCR regularization term acts on auxiliary predictions at scales $l = 1,2,3$ . For each, soft-labeling is performed using temperature-scaled softmax:

$\tilde{P}_d^{(l)}[i,c] = \frac{\exp(Z_d^{(l)}[i,c]/T)}{\sum_{c'=1}^C \exp(Z_d^{(l)}[i,c']/T)}$

where $Z_d^{(l)}$ is the logits tensor, $T$ denotes the temperature (set to 10), and $[i,c]$ indexes pixels and classes.

CPCR enforces reciprocal consistency between the two branches via symmetric KL-divergence at each auxiliary scale:

$\mathrm{KL}(\tilde{P}_{\mathrm{TR}}^{(l)} \| \tilde{P}_{\mathrm{UP}}^{(l)}) + \mathrm{KL}(\tilde{P}_{\mathrm{UP}}^{(l)} \| \tilde{P}_{\mathrm{TR}}^{(l)})$

The overall CPCR loss averages these terms over three scales:

$\mathcal{L}_{\mathrm{CPCR}} = \frac{1}{3} \sum_{l=1}^{3} \left[\mathrm{KL}(\tilde{P}_{\mathrm{TR}}^{(l)} \| \tilde{P}_{\mathrm{UP}}^{(l)}) + \mathrm{KL}(\tilde{P}_{\mathrm{UP}}^{(l)} \| \tilde{P}_{\mathrm{TR}}^{(l)})\right]$

This multi-scale, cross-branch distillation framework facilitates the alignment of deep feature distributions hierarchically, supporting effective exploitation of unlabeled data and enhancing regularization.

3. Composite Loss Function and Optimization Protocol

DBPNet with CPCR employs a total loss function aggregating the following components:

Supervised Dice Loss ( $\mathcal{L}_{\mathrm{sup}}$ ): Applied to main outputs ( $l=4$ ) on labeled data, summing Dice losses for both branches:

$\mathcal{L}_{\mathrm{sup}} = \mathcal{L}_{\mathrm{Dice}}(P_{\mathrm{TR}}^{(4)}, Y) + \mathcal{L}_{\mathrm{Dice}}(P_{\mathrm{UP}}^{(4)}, Y)$

where $Y$ is the ground-truth mask.
Main Output Consistency ( $\mathcal{L}_{\mathrm{con}}^{\mathrm{main}}$ ): Symmetric KL-divergence between main outputs of the two decoders:

$\mathcal{L}_{\mathrm{con}}^{\mathrm{main}} = \mathrm{KL}(\tilde{P}_{\mathrm{TR}}^{(4)} \| \tilde{P}_{\mathrm{UP}}^{(4)}) + \mathrm{KL}(\tilde{P}_{\mathrm{UP}}^{(4)} \| \tilde{P}_{\mathrm{TR}}^{(4)})$
Average-Prediction Entropy Minimization ( $\mathcal{L}_{\mathrm{um}}$ ): Reduces uncertainty by minimizing pixel-wise entropy of the averaged main output prediction:

$\bar{P}^{(4)} = \frac{P_{\mathrm{TR}}^{(4)} + P_{\mathrm{UP}}^{(4)}}{2}$

$\mathcal{L}_{\mathrm{um}} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{C} \bar{P}^{(4)}[i,c] \; \log(\bar{P}^{(4)}[i,c])$

where $N$ is the number of pixels.
CPCR Regularization ( $\mathcal{L}_{\mathrm{CPCR}}$ ): As defined above.

The total loss is a weighted sum:

$\mathcal{L}_{\mathrm{total}} = \mathcal{L}_{\mathrm{sup}} + \alpha \left(\mathcal{L}_{\mathrm{con}}^{\mathrm{main}} + \mathcal{L}_{\mathrm{um}}\right) + \lambda(t) \, \mathcal{L}_{\mathrm{CPCR}}$

With $\alpha = 0.1$ , and $\lambda(t)$ as a sigmoid-shaped warm-up weight increasing from 0 to $w_{\max} = 0.1$ over $t_{\max} = 200$ epochs:

$\lambda(t) = w_{\max} \exp\left(-5 \left(1 - \frac{t}{t_{\max}}\right)^2\right)$

4. Training and Inference Workflow

Training DBPNet+CPCR proceeds as follows, using a batch size equally split between labeled and unlabeled examples. Random augmentations (flips, rotations) are applied to inputs prior to encoding. The algorithm cycles through the following steps for $I = 50,000$ iterations:

Forward pass through encoder and both decoders;
Extraction of logits and computation of probability maps;
Softmax (standard and temperature) transformations;
Calculation of loss terms on appropriate data (labeled or both);
Weighted aggregation of loss components and update of $\lambda(t)$ per iteration;
Backpropagation and parameter updates via SGD (learning rate $\eta$ , momentum $0.9$, weight decay $1 \times 10^{-4}$ ).

At inference, only the UP-branch main output ( $P_{\mathrm{UP}}^{(4)}$ ) is used.

Training Pseudocode

for iteration t = 1 to I:
    Sample B/2 labeled {(x_i, y_i)}, B/2 unlabeled {x_j}
    Apply random ± flip, ± rotation
    Encoder → features → Decoders TR, UP
    For d ∈ {TR, UP}, l = 1..4:
        Obtain logits Z_d^{(l)}
        Compute P_d^{(l)} = softmax(Z_d^{(l)})
        Compute temperature softmax \tilde{P}_d^{(l)}
    Compute ℒ_sup (labeled)
    Compute ℒ_con^main, ℒ_um, ℒ_CPCR (all data)
    Calculate ℒ_total using α and λ(t)
    Backpropagate gradient; update θ via SGD
    Every 150 iterations: update λ(t_epoch)

At test time: output

P_{\mathrm{UP}}^{(4)}

only.

5. Empirical Evaluation and Comparative Analysis

Evaluation on the ACDC dataset using only 10% annotated labels demonstrates the efficacy of CPCR. The method achieves the following results:

Model	DSC (%)	IoU (%)	95HD (mm)	ASD (mm)
Baseline U-Net	77.3	—	—	—
UA-MT (2019)	81.6	—	—	—
URPC (2021)	81.8	—	—	—
MC-Net+ (2022)	87.1	—	—	—
DVCPS (2025)	88.8	—	—	—
DBPNet + CPCR	88.11	79.45	4.12	1.11

With only two decoder branches, CPCR achieves competitive Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) compared to recent works, while yielding superior boundary accuracy reflected in lower 95% Hausdorff distance and Average Surface Distance (ASD). This suggests boundary precision benefits from hierarchical cross-decoder regularization.

6. Significance and Methodological Context

CPCR extends existing paradigms in semi-supervised segmentation by infusing multi-scale reciprocal distillation into dual decoder architectures. Distinct decoding mechanisms and targeted perturbations promote diverse feature representations, while cross-pyramid KL-based regularization supports hierarchical knowledge transfer and label-efficient training. The approach combines the strengths of consistency learning and uncertainty minimization, integrating them seamlessly with supervised and self-supervised objectives through a sigmoidal weighting schedule.

A plausible implication is that the CPCR mechanism could generalize to other hierarchical tasks where multi-branch architectures and cross-scale feature alignment are advantageous. Its demonstrated competitive accuracy and compact design attest to the potential for scalable semi-supervised learning in medical imaging with limited annotations.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Cross-Pyramid Consistency Regularization (CPCR).