Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 109 tok/s
Gemini 3.0 Pro 52 tok/s Pro
Gemini 2.5 Flash 159 tok/s Pro
Kimi K2 203 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

CxR Codes in Chest X-Ray Segmentation

Updated 16 November 2025
  • CxR Codes are task-specific latent codes that control AdaIN layers, enabling a single network to perform supervised segmentation, domain adaptation, and self-supervised knowledge distillation.
  • They modify normalization parameters on-the-fly using small MLPs, allowing flexible conversion between identity and learned styles to handle normal and abnormal chest X-rays.
  • The approach achieves high Dice scores and stable TPR metrics across diverse datasets, effectively mitigating performance drops under domain shifts and scarcity of labeled data.

CxR Codes, in the context of chest X-ray image segmentation, refer to a set of task-specific latent codes used to control Adaptive Instance Normalization (AdaIN) layers within a single encoder–decoder neural network. This approach enables unified treatment of supervised segmentation, unsupervised domain adaptation, and self-supervised knowledge distillation by modulating network behavior entirely through the selection of AdaIN codes, without modifying the underlying weights. The versatility and efficacy of this paradigm are demonstrated in the segmentation of both normal and abnormal (e.g., pneumonia, COVID-19) chest X-rays, where labeled data may be scarce for abnormal cases and domain shifts are substantial (Oh et al., 2021).

1. Architectural Overview and AdaIN Operator

The core architecture is a U-Net–style encoder–decoder generator, GG, constructed from residual blocks. An initial 1×11 \times 1 convolution expands the input single-channel image to 64 feature maps, followed by four downsampling residual blocks reducing spatial dimensions (to 16×1616 \times 16) and increasing channel depth (to 512), succeeded by two additional bottleneck residual blocks. The decoder symmetrically reverses this process via upsampling residual blocks, culminating in the output segmentation mask. Each residual block contains AdaIN layers whose affine parameters (scale and shift) are dynamically produced by small multilayer-perceptron (MLP) modules FeF_e for encoder blocks and FdF_d for decoder blocks, rather than being learned statically.

The AdaIN operator normalizes an intermediate feature map xRC×H×Wx \in \mathbb{R}^{C\times H\times W} and rescales it according to an AdaIN code aa that yields per-channel shift and scale:

AdaIN(x;a)=aσ(xμ(x)σ(x))+aμ\text{AdaIN}(x;\,a) = a_\sigma \left( \frac{x - \mu(x)}{\sigma(x)} \right) + a_\mu

with μ(x)\mu(x) and σ(x)\sigma(x) as the channel-wise mean and standard deviation, and aμ,aσa_\mu, a_\sigma generated on-the-fly from aa via the code generator networks. A style encoder SS is also introduced to extract AdaIN parameters from reference images for style guidance during domain adaptation scenarios.

2. Task Modes Enabled by AdaIN Codes

CxR Codes are latent vectors or codes aa which, when routed through FeF_e and FdF_d, induce one of three principal network modes:

  • Supervised Segmentation of Normal CXRs: The segmentation code asega_\text{seg} sets both Fe(aseg)=(0,1)F_e(a_\text{seg}) = (0,1) and Fd(aseg)=(0,1)F_d(a_\text{seg}) = (0,1), corresponding to identity normalization throughout the network. This regime is trained using paired images and lung masks from the normal domain (PXP_X).
  • Unsupervised Style-Based Domain Adaptation: For mapping between domains—normal (XX) and abnormal (YY)—the codes adaXa_{\text{da}}^X and adaYa_{\text{da}}^Y are used with Fe(ada)=(0,1)F_e(a_\text{da}) = (0,1) and Fd(ada)F_d(a_\text{da}) as a learned code, producing samples stylized as the target domain. During inference or reference-guided adaptation, the decoder's AdaIN parameters may be generated by the style encoder SS from a target domain sample.
  • Self-Supervised Segmentation of Abnormal CXRs: The self-supervision code aselfa_\text{self} utilizes learned AdaIN parameters in the encoder (Fe(aself)F_e(a_\text{self})) and identity normalization in the decoder (Fd(aself)=(0,1)F_d(a_\text{self}) = (0,1)), effectively restyling the abnormal CXR features to align with the normal segmentation regime.

In total, five codes are implemented, differing in which segments of the network employ identity versus learned normalization parameters, as summarized in the following table:

Task Encoder Code Decoder Code
Supervised (normal, asega_\text{seg}) Identity (0,1)(0,1) Identity (0,1)(0,1)
Domain Adaptation (adaX/Ya_\text{da}^X/Y) Identity (0,1)(0,1) Learned
Self-supervised (aselfa_\text{self}) Learned Identity (0,1)(0,1)

3. Loss Functions and Training Objectives

The training of the network involves three principal loss terms:

  • Supervised Segmentation Loss:

seg(G)=ExPX[izilogpi(G(x,aseg))]\ell_\text{seg}(G) = -\mathbb{E}_{x \sim P_X} \left[ \sum_i z_i \log p_i \left(G(x, a_\text{seg})\right) \right]

where zz is the ground-truth mask, and pip_i are the softmaxed segmentation outputs.

  • Domain Adaptation Loss: Inspired by StarGANv2, the domain adaptation loss decomposes into adversarial, cycle consistency, style, and diversity losses:

da=adv+λcyclecycle+λstylestyleλdivdiv\ell_\text{da} = \ell_\text{adv} + \lambda_\text{cycle}\ell_\text{cycle} + \lambda_\text{style}\ell_\text{style} - \lambda_\text{div}\ell_\text{div}

  • adv\ell_\text{adv} ensures generated images under G(s,adaT)G(s, a_\text{da}^T) are indistinguishable from real target domain images.
  • cycle\ell_\text{cycle} enforces invertibility of domain translation.
  • style\ell_\text{style} encourages the style encoder to match the injected code.
  • div\ell_\text{div} penalizes insufficient diversity among different code samples.
    • Self-Supervised Consistency Loss:

self=λinterinter+λintraintra\ell_\text{self} = \lambda_\text{inter}\ell_\text{inter} + \lambda_\text{intra}\ell_\text{intra}

with

inter=EyPY G(G(y,adaT),aseg)G(y,aself)1\ell_\text{inter} = \mathbb{E}_{y\sim P_Y} \|\ G'(G'(y, a_\text{da}^{T}), a_\text{seg}) - G(y, a_\text{self})\|_1

intra=ExPX G(x,aself)G(x,aseg)1\ell_\text{intra} = \mathbb{E}_{x\sim P_X} \|\ G(x, a_\text{self}) - G'(x, a_\text{seg})\|_1

where GG' is a frozen copy of the generator, providing stable targets for the consistency constraints.

These loss functions collectively enforce correct segmentation on normal data, robust domain translation, and distillation of segmentation knowledge into abnormal domain representations.

4. Training Workflow and Regime

Training proceeds in sequential stages:

  • Initialization of GG, FeF_e, FdF_d, SS, and a discriminator DD.
  • For approximately 20,000 iterations, supervised segmentation (with asega_\text{seg}) and unsupervised domain adaptation (with adaXa_{\text{da}}^X, adaYa_{\text{da}}^Y) are jointly optimized on the normal (JSRT) and abnormal (RSNA, Cohen) datasets.
  • Upon stabilization of segmentation on held-out abnormal samples, the self-supervision loss self\ell_\text{self} is activated for roughly 5,000 additional iterations. In this phase, both normal and pneumonia CXRs are processed under the self-supervision regime, using aselfa_\text{self} and frozen GG'.
  • Throughout training, only the AdaIN codes are switched, the core network weights remain shared, enabling all modes (segmentation, style-transfer, knowledge distillation) to co-exist within a single generator.

5. Experimental Evaluation and Comparative Performance

Empirical assessments utilize the following datasets:

  • Labeled Normal Data: JSRT (178 train, 20 val, 49 test) with segmentation masks.
  • Unlabeled Abnormal Data: RSNA (218), Cohen COVID-19 (640 train, 40 val/test).
  • Additional Test Sets: NLM normal CXRs (80), BIMCV-13 COVID-19 (13 labeled), and large unlabeled BIMCV and BRIXIA COVID-19 repositories.

Metrics include Dice similarity on normal lungs and true-positive rate (TPR) for abnormal consolidation/ground-glass opacity (GGO) regions. Baselines consist of U-Net, CycleGAN+U-Net, StarGANv2+U-Net, XLSor, and lungVAE.

Key findings:

  • On normal CXRs—including those with synthetic intensity/noise perturbations—both the proposed method and the variant with self-supervised loss (+self+\ell_\text{self}) maintain Dice scores 0.90\approx 0.90 under harsh shifts (maximum drop of 0.02-0.02), while all baselines deteriorate severely.
  • For abnormal CXRs, U-Net and previous semi-supervised approaches exhibit a sharp TPR decline (from 0.90\approx 0.90 to $0.6$) under mild shifts. In contrast, the proposed method achieves 0.900.890.890.90 \rightarrow 0.89 \rightarrow 0.89 and +self+\ell_\text{self} 0.900.890.860.90 \rightarrow 0.89 \rightarrow 0.86, substantially outperforming alternatives.
  • Qualitative inspection reveals that the proposed approach successfully includes consolidation and GGO within the lung mask, while conventional networks often under-segment these pathologies.

6. Implications and Significance

The introduction of CxR Codes as AdaIN-driven, task-controlling latent variables allows a single shared convolutional network to multiplex supervised segmentation, unsupervised domain adaptation, and knowledge distillation by code swapping alone. This approach simplifies model design, leverages all available labeled and unlabeled data, and achieves superior robustness under domain shifts between normal and abnormal chest radiographs, as evidenced by quantitative and qualitative performance on COVID-19 and pneumonia datasets. It establishes a versatile framework, generalizable to scenarios where domain shifts and scarcity of target-domain segmentation labels are present.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to CxR Codes.