CxR Codes in Chest X-Ray Segmentation

Updated 16 November 2025

CxR Codes are task-specific latent codes that control AdaIN layers, enabling a single network to perform supervised segmentation, domain adaptation, and self-supervised knowledge distillation.
They modify normalization parameters on-the-fly using small MLPs, allowing flexible conversion between identity and learned styles to handle normal and abnormal chest X-rays.
The approach achieves high Dice scores and stable TPR metrics across diverse datasets, effectively mitigating performance drops under domain shifts and scarcity of labeled data.

CxR Codes, in the context of chest X-ray image segmentation, refer to a set of task-specific latent codes used to control Adaptive Instance Normalization (AdaIN) layers within a single encoder–decoder neural network. This approach enables unified treatment of supervised segmentation, unsupervised domain adaptation, and self-supervised knowledge distillation by modulating network behavior entirely through the selection of AdaIN codes, without modifying the underlying weights. The versatility and efficacy of this paradigm are demonstrated in the segmentation of both normal and abnormal (e.g., pneumonia, COVID-19) chest X-rays, where labeled data may be scarce for abnormal cases and domain shifts are substantial (Oh et al., 2021).

1. Architectural Overview and AdaIN Operator

The core architecture is a U-Net–style encoder–decoder generator, $G$ , constructed from residual blocks. An initial $1 \times 1$ convolution expands the input single-channel image to 64 feature maps, followed by four downsampling residual blocks reducing spatial dimensions (to $16 \times 16$ ) and increasing channel depth (to 512), succeeded by two additional bottleneck residual blocks. The decoder symmetrically reverses this process via upsampling residual blocks, culminating in the output segmentation mask. Each residual block contains AdaIN layers whose affine parameters (scale and shift) are dynamically produced by small multilayer-perceptron (MLP) modules $F_e$ for encoder blocks and $F_d$ for decoder blocks, rather than being learned statically.

The AdaIN operator normalizes an intermediate feature map $x \in \mathbb{R}^{C\times H\times W}$ and rescales it according to an AdaIN code $a$ that yields per-channel shift and scale:

$\text{AdaIN}(x;\,a) = a_\sigma \left( \frac{x - \mu(x)}{\sigma(x)} \right) + a_\mu$

with $\mu(x)$ and $\sigma(x)$ as the channel-wise mean and standard deviation, and $a_\mu, a_\sigma$ generated on-the-fly from $a$ via the code generator networks. A style encoder $S$ is also introduced to extract AdaIN parameters from reference images for style guidance during domain adaptation scenarios.

2. Task Modes Enabled by AdaIN Codes

CxR Codes are latent vectors or codes $a$ which, when routed through $F_e$ and $F_d$ , induce one of three principal network modes:

Supervised Segmentation of Normal CXRs: The segmentation code $a_\text{seg}$ sets both $F_e(a_\text{seg}) = (0,1)$ and $F_d(a_\text{seg}) = (0,1)$ , corresponding to identity normalization throughout the network. This regime is trained using paired images and lung masks from the normal domain ( $P_X$ ).
Unsupervised Style-Based Domain Adaptation: For mapping between domains—normal ( $X$ ) and abnormal ( $Y$ )—the codes $a_{\text{da}}^X$ and $a_{\text{da}}^Y$ are used with $F_e(a_\text{da}) = (0,1)$ and $F_d(a_\text{da})$ as a learned code, producing samples stylized as the target domain. During inference or reference-guided adaptation, the decoder's AdaIN parameters may be generated by the style encoder $S$ from a target domain sample.
Self-Supervised Segmentation of Abnormal CXRs: The self-supervision code $a_\text{self}$ utilizes learned AdaIN parameters in the encoder ( $F_e(a_\text{self})$ ) and identity normalization in the decoder ( $F_d(a_\text{self}) = (0,1)$ ), effectively restyling the abnormal CXR features to align with the normal segmentation regime.

In total, five codes are implemented, differing in which segments of the network employ identity versus learned normalization parameters, as summarized in the following table:

Task	Encoder Code	Decoder Code
Supervised (normal, $a_\text{seg}$ )	Identity $(0,1)$	Identity $(0,1)$
Domain Adaptation ( $a_\text{da}^X/Y$ )	Identity $(0,1)$	Learned
Self-supervised ( $a_\text{self}$ )	Learned	Identity $(0,1)$

3. Loss Functions and Training Objectives

The training of the network involves three principal loss terms:

Supervised Segmentation Loss:

$\ell_\text{seg}(G) = -\mathbb{E}_{x \sim P_X} \left[ \sum_i z_i \log p_i \left(G(x, a_\text{seg})\right) \right]$

where $z$ is the ground-truth mask, and $p_i$ are the softmaxed segmentation outputs.

Domain Adaptation Loss: Inspired by StarGANv2, the domain adaptation loss decomposes into adversarial, cycle consistency, style, and diversity losses:

$\ell_\text{da} = \ell_\text{adv} + \lambda_\text{cycle}\ell_\text{cycle} + \lambda_\text{style}\ell_\text{style} - \lambda_\text{div}\ell_\text{div}$

$\ell_\text{adv}$ ensures generated images under $G(s, a_\text{da}^T)$ are indistinguishable from real target domain images.
$\ell_\text{cycle}$ enforces invertibility of domain translation.
$\ell_\text{style}$ encourages the style encoder to match the injected code.
$\ell_\text{div}$ $ℓ_{div}$ penalizes insufficient diversity among different code samples.
- Self-Supervised Consistency Loss:

$\ell_\text{self} = \lambda_\text{inter}\ell_\text{inter} + \lambda_\text{intra}\ell_\text{intra}$

with

$\ell_\text{inter} = \mathbb{E}_{y\sim P_Y} \|\ G'(G'(y, a_\text{da}^{T}), a_\text{seg}) - G(y, a_\text{self})\|_1$

$\ell_\text{intra} = \mathbb{E}_{x\sim P_X} \|\ G(x, a_\text{self}) - G'(x, a_\text{seg})\|_1$

where $G'$ is a frozen copy of the generator, providing stable targets for the consistency constraints.

These loss functions collectively enforce correct segmentation on normal data, robust domain translation, and distillation of segmentation knowledge into abnormal domain representations.

4. Training Workflow and Regime

Training proceeds in sequential stages:

Initialization of $G$ , $F_e$ , $F_d$ , $S$ , and a discriminator $D$ .
For approximately 20,000 iterations, supervised segmentation (with $a_\text{seg}$ ) and unsupervised domain adaptation (with $a_{\text{da}}^X$ , $a_{\text{da}}^Y$ ) are jointly optimized on the normal (JSRT) and abnormal (RSNA, Cohen) datasets.
Upon stabilization of segmentation on held-out abnormal samples, the self-supervision loss $\ell_\text{self}$ is activated for roughly 5,000 additional iterations. In this phase, both normal and pneumonia CXRs are processed under the self-supervision regime, using $a_\text{self}$ and frozen $G'$ .
Throughout training, only the AdaIN codes are switched, the core network weights remain shared, enabling all modes (segmentation, style-transfer, knowledge distillation) to co-exist within a single generator.

5. Experimental Evaluation and Comparative Performance

Empirical assessments utilize the following datasets:

Labeled Normal Data: JSRT (178 train, 20 val, 49 test) with segmentation masks.
Unlabeled Abnormal Data: RSNA (218), Cohen COVID-19 (640 train, 40 val/test).
Additional Test Sets: NLM normal CXRs (80), BIMCV-13 COVID-19 (13 labeled), and large unlabeled BIMCV and BRIXIA COVID-19 repositories.

Metrics include Dice similarity on normal lungs and true-positive rate (TPR) for abnormal consolidation/ground-glass opacity (GGO) regions. Baselines consist of U-Net, CycleGAN+U-Net, StarGANv2+U-Net, XLSor, and lungVAE.

Key findings:

On normal CXRs—including those with synthetic intensity/noise perturbations—both the proposed method and the variant with self-supervised loss ( $+\ell_\text{self}$ ) maintain Dice scores $\approx 0.90$ under harsh shifts (maximum drop of $-0.02$ ), while all baselines deteriorate severely.
For abnormal CXRs, U-Net and previous semi-supervised approaches exhibit a sharp TPR decline (from $\approx 0.90$ to $0.6$) under mild shifts. In contrast, the proposed method achieves $0.90 \rightarrow 0.89 \rightarrow 0.89$ and $+\ell_\text{self}$ $0.90 \rightarrow 0.89 \rightarrow 0.86$ , substantially outperforming alternatives.
Qualitative inspection reveals that the proposed approach successfully includes consolidation and GGO within the lung mask, while conventional networks often under-segment these pathologies.

6. Implications and Significance

The introduction of CxR Codes as AdaIN-driven, task-controlling latent variables allows a single shared convolutional network to multiplex supervised segmentation, unsupervised domain adaptation, and knowledge distillation by code swapping alone. This approach simplifies model design, leverages all available labeled and unlabeled data, and achieves superior robustness under domain shifts between normal and abnormal chest radiographs, as evidenced by quantitative and qualitative performance on COVID-19 and pneumonia datasets. It establishes a versatile framework, generalizable to scenarios where domain shifts and scarcity of target-domain segmentation labels are present.

PDF Markdown Chat (Pro)

References (1)

CXR Segmentation by AdaIN-based Domain Adaptation and Knowledge Distillation (2021)

Follow Topic

Get notified by email when new papers are published related to CxR Codes.