Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional SPAdaIN for Adaptive Registration

Updated 23 April 2026
  • CSAIN is a conditional module that adapts instance normalization spatially to generate diverse deformation fields for image registration.
  • It integrates with a Laplacian-pyramid U-Net to enable region-specific regularization without needing retraining for varied hyperparameter settings.
  • Quantitative evaluations on brain MRI data demonstrate improved Dice scores and reduced deformation foldings compared to traditional methods.

Conditional Spatially-Adaptive Instance Normalization (CSAIN) is a module for deep neural networks that enables spatially-varying and adaptive regularization in the context of deformable image registration. The core innovation is the conditioning of instance normalization on a spatial hyperparameter map, allowing a single registration model to produce a family of plausible deformation fields governed by local regularization weights, with no need for retraining for each configuration. This approach addresses inherent limitations in previous methods that required training separate models per hyperparameter setting and did not support spatially-dependent regularization (Wang et al., 2023).

1. Mathematical Formulation of CSAIN

Let F∈RC×ΩF \in \mathbb{R}^{C \times \Omega} denote a feature map at a given network layer, where Ω\Omega is the set of spatial positions xx and CC is the number of channels. For spatially adaptive conditioning, a spatial hyperparameter map H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega} is provided, typically corresponding to region-specific regularization weights.

CSAIN is implemented as follows:

  1. Instance Normalization:

μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}

F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}

  1. Spatial Conditioning via Learned Scale and Shift:

    • HH is resampled to match the feature resolution, yielding H′H'.
    • Two shallow convolutional layers generate per-channel, per-location scale γi(x)\gamma_i(x) and shift Ω\Omega0:

    Ω\Omega1

  • The CSAIN-modulated feature:

    Ω\Omega2

    or, vectorized over all channels,

    Ω\Omega3

    where Ω\Omega4 denotes the Hadamard product and Ω\Omega5 are broadcast per channel.

This formulation permits each spatial location and channel to be adaptively scaled and shifted in response to arbitrarily specified local regularization weights.

2. Network Integration and Architecture

The CSAIN module is instantiated within a Laplacian-pyramid U-Net backbone ("LapIRN") comprising Ω\Omega6 resolution levels. Each level incorporates a Ω\Omega7 downsampling encoder, Ω\Omega8 residual blocks, and a decoder. The residual blocks are replaced by CSAIN-blocks, where each incorporates two consecutive CSAIN layers (employing Ω\Omega9 convolutions for xx0 and xx1) interleaved with LeakyReLU activations, along with a pre-activation skip connection.

Encoding of the conditioning map proceeds as follows:

  • Binary region masks xx2 select each of xx3 anatomical regions. A vector of region-specific weights xx4 induces a one-channel map xx5.
  • To mitigate sharp boundaries, xx6 is convolved with a Gaussian kernel (std xx7 voxels, window xx8), resulting in xx9.
  • At each feature resolution, CC0 is resampled and routed to the corresponding CSAIN-block's conditioning layers.

3. Deformable Registration Framework

Conditional SPAdaIN is deployed within an end-to-end deformable registration framework. The system takes as input a fixed image CC1, a moving image CC2, and a spatial regularization map CC3. The output is a dense displacement field CC4, yielding a deformation CC5.

The objective at pyramid level CC6 is:

CC7

where CC8 are images downsampled to level CC9, H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}0 is local normalized cross-correlation with window size H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}1, and H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}2 is the spatial gradient. The spatially-varying regularization is enforced by elementwise multiplication with H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}3. The total loss sums over all H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}4 levels:

H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}5

This structure enables spatially-varying regularization directly within the data flow of the network.

4. Training Regimen and Inference Modality

Training is conducted using the OASIS T1 brain MRI dataset (416 volumes, pre-aligned, skull-stripped), partitioned into 340 training, 20 validation, and 56 test subjects. Registration pairs are established via subject permutation. Anatomical regions (H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}6) are delineated as background, cortex, subcortical gray matter, white matter, and cerebrospinal fluid. For each minibatch, region weights H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}7 are sampled uniformly from H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}8, composed and Gaussian-smoothed to H∈R1×ΩH \in \mathbb{R}^{1 \times \Omega}9 per the protocol. The model parameters μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}0 (network weights, including CSAIN kernels) are optimized by Adam at learning rate μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}1, minimizing the overall loss μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}2.

Inference utilizes the fixed network parameters. To obtain a single best deformation, μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}3 may be manually selected or selected via automated search:

μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}4

with μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}5 the registered output parameterized by μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}6. Critically, only μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}7 is tuned at inference without retraining, enabling rapid generation of multiple plausible outputs under varying spatially adaptive smoothness.

5. Quantitative and Qualitative Evaluation

Empirical evaluation on the OASIS test set (56 subjects, 5 regions) demonstrates:

Method Avg. Dice %folds (μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}8) Avg μi=1∣Ω∣∑x∈ΩFi(x),σi=1∣Ω∣∑x∈Ω(Fi(x)−μi)2\mu_i = \frac{1}{|\Omega|}\sum_{x \in \Omega} F_i(x), \quad \sigma_i = \sqrt{\frac{1}{|\Omega|} \sum_{x \in \Omega} (F_i(x) - \mu_i)^2}9 Std(F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}0) F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}1
CSAIN w/ Gaussian F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}2 0.764 1.04 F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}3 F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}4 [3.76, 2.42, 2.61, 2.33, 0.67]
CSAIN w/o Gaussian (F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}5 only) 0.759 1.22 F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}6 F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}7 [3.58, 1.83, 2.18, 1.98, 0.56]
Baseline (CIR-DM, F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}8) 0.749 0.66 F^i(x)=Fi(x)−μiσi+ϵ,ϵ≈10−5\hat{F}_i(x) = \frac{F_i(x) - \mu_i}{\sigma_i + \epsilon}, \quad \epsilon \approx 10^{-5}9 HH0 [1, 1, 1, 1, 1]

Key findings include:

  • CSAIN with spatially-varying, Gaussian-smoothed regularization achieves HH11.5-point improvement in Dice score over spatially-invariant baseline.
  • Adaptive HH2 selection reduces deformation gradient magnitudes (achieving smoother transformations) while maintaining or improving accuracy.
  • Gaussian smoothing of HH3 reduces the percentage of foldings (locations where the Jacobian determinant HH4) compared to sharp boundary maps.

Ablation indicates that removing Gaussian smoothing results in a minor Dice decrease and increased foldings, supporting the benefit of enforcing smooth transitions in the hyperparameter map. Qualitative analysis shows that varying a single HH5 affects Dice and deformation regularity locally within the specified anatomical region, minimally impacting distant areas. This evidences effective spatial adaptation (Wang et al., 2023).

6. Significance and Implications

CSAIN provides a mechanism for controlling spatially-variant regularization in deformable image registration using a single network conditioned at inference on user- or optimizer-specified hyperparameter maps. This obviates the need for retraining across hyperparameter sweeps and supports automatic or interactive hyperparameter selection. The approach is validated experimentally through improved Dice scores and enhanced control over deformation regularity, with fine-grained adaptation documented both quantitatively and qualitatively across anatomical regions. The methodology integrates seamlessly with common encoder-decoder networks and generalizes to variable region definitions via mask-based construction of conditioning maps.

The results suggest that CSAIN constitutes an advance in the design of conditional, normalization-based deep learning layers for spatially adaptive regularization in image registration workflows, meriting further investigation and adoption in broader medical imaging scenarios (Wang et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional SPAdaIN (CSAIN).