CxR Codes in Chest X-Ray Segmentation
- CxR Codes are task-specific latent codes that control AdaIN layers, enabling a single network to perform supervised segmentation, domain adaptation, and self-supervised knowledge distillation.
- They modify normalization parameters on-the-fly using small MLPs, allowing flexible conversion between identity and learned styles to handle normal and abnormal chest X-rays.
- The approach achieves high Dice scores and stable TPR metrics across diverse datasets, effectively mitigating performance drops under domain shifts and scarcity of labeled data.
CxR Codes, in the context of chest X-ray image segmentation, refer to a set of task-specific latent codes used to control Adaptive Instance Normalization (AdaIN) layers within a single encoder–decoder neural network. This approach enables unified treatment of supervised segmentation, unsupervised domain adaptation, and self-supervised knowledge distillation by modulating network behavior entirely through the selection of AdaIN codes, without modifying the underlying weights. The versatility and efficacy of this paradigm are demonstrated in the segmentation of both normal and abnormal (e.g., pneumonia, COVID-19) chest X-rays, where labeled data may be scarce for abnormal cases and domain shifts are substantial (Oh et al., 2021).
1. Architectural Overview and AdaIN Operator
The core architecture is a U-Net–style encoder–decoder generator, , constructed from residual blocks. An initial convolution expands the input single-channel image to 64 feature maps, followed by four downsampling residual blocks reducing spatial dimensions (to ) and increasing channel depth (to 512), succeeded by two additional bottleneck residual blocks. The decoder symmetrically reverses this process via upsampling residual blocks, culminating in the output segmentation mask. Each residual block contains AdaIN layers whose affine parameters (scale and shift) are dynamically produced by small multilayer-perceptron (MLP) modules for encoder blocks and for decoder blocks, rather than being learned statically.
The AdaIN operator normalizes an intermediate feature map and rescales it according to an AdaIN code that yields per-channel shift and scale:
with and as the channel-wise mean and standard deviation, and generated on-the-fly from via the code generator networks. A style encoder is also introduced to extract AdaIN parameters from reference images for style guidance during domain adaptation scenarios.
2. Task Modes Enabled by AdaIN Codes
CxR Codes are latent vectors or codes which, when routed through and , induce one of three principal network modes:
- Supervised Segmentation of Normal CXRs: The segmentation code sets both and , corresponding to identity normalization throughout the network. This regime is trained using paired images and lung masks from the normal domain ().
- Unsupervised Style-Based Domain Adaptation: For mapping between domains—normal () and abnormal ()—the codes and are used with and as a learned code, producing samples stylized as the target domain. During inference or reference-guided adaptation, the decoder's AdaIN parameters may be generated by the style encoder from a target domain sample.
- Self-Supervised Segmentation of Abnormal CXRs: The self-supervision code utilizes learned AdaIN parameters in the encoder () and identity normalization in the decoder (), effectively restyling the abnormal CXR features to align with the normal segmentation regime.
In total, five codes are implemented, differing in which segments of the network employ identity versus learned normalization parameters, as summarized in the following table:
| Task | Encoder Code | Decoder Code |
|---|---|---|
| Supervised (normal, ) | Identity | Identity |
| Domain Adaptation () | Identity | Learned |
| Self-supervised () | Learned | Identity |
3. Loss Functions and Training Objectives
The training of the network involves three principal loss terms:
- Supervised Segmentation Loss:
where is the ground-truth mask, and are the softmaxed segmentation outputs.
- Domain Adaptation Loss: Inspired by StarGANv2, the domain adaptation loss decomposes into adversarial, cycle consistency, style, and diversity losses:
- ensures generated images under are indistinguishable from real target domain images.
- enforces invertibility of domain translation.
- encourages the style encoder to match the injected code.
- penalizes insufficient diversity among different code samples.
- Self-Supervised Consistency Loss:
with
where is a frozen copy of the generator, providing stable targets for the consistency constraints.
These loss functions collectively enforce correct segmentation on normal data, robust domain translation, and distillation of segmentation knowledge into abnormal domain representations.
4. Training Workflow and Regime
Training proceeds in sequential stages:
- Initialization of , , , , and a discriminator .
- For approximately 20,000 iterations, supervised segmentation (with ) and unsupervised domain adaptation (with , ) are jointly optimized on the normal (JSRT) and abnormal (RSNA, Cohen) datasets.
- Upon stabilization of segmentation on held-out abnormal samples, the self-supervision loss is activated for roughly 5,000 additional iterations. In this phase, both normal and pneumonia CXRs are processed under the self-supervision regime, using and frozen .
- Throughout training, only the AdaIN codes are switched, the core network weights remain shared, enabling all modes (segmentation, style-transfer, knowledge distillation) to co-exist within a single generator.
5. Experimental Evaluation and Comparative Performance
Empirical assessments utilize the following datasets:
- Labeled Normal Data: JSRT (178 train, 20 val, 49 test) with segmentation masks.
- Unlabeled Abnormal Data: RSNA (218), Cohen COVID-19 (640 train, 40 val/test).
- Additional Test Sets: NLM normal CXRs (80), BIMCV-13 COVID-19 (13 labeled), and large unlabeled BIMCV and BRIXIA COVID-19 repositories.
Metrics include Dice similarity on normal lungs and true-positive rate (TPR) for abnormal consolidation/ground-glass opacity (GGO) regions. Baselines consist of U-Net, CycleGAN+U-Net, StarGANv2+U-Net, XLSor, and lungVAE.
Key findings:
- On normal CXRs—including those with synthetic intensity/noise perturbations—both the proposed method and the variant with self-supervised loss () maintain Dice scores under harsh shifts (maximum drop of ), while all baselines deteriorate severely.
- For abnormal CXRs, U-Net and previous semi-supervised approaches exhibit a sharp TPR decline (from to $0.6$) under mild shifts. In contrast, the proposed method achieves and , substantially outperforming alternatives.
- Qualitative inspection reveals that the proposed approach successfully includes consolidation and GGO within the lung mask, while conventional networks often under-segment these pathologies.
6. Implications and Significance
The introduction of CxR Codes as AdaIN-driven, task-controlling latent variables allows a single shared convolutional network to multiplex supervised segmentation, unsupervised domain adaptation, and knowledge distillation by code swapping alone. This approach simplifies model design, leverages all available labeled and unlabeled data, and achieves superior robustness under domain shifts between normal and abnormal chest radiographs, as evidenced by quantitative and qualitative performance on COVID-19 and pneumonia datasets. It establishes a versatile framework, generalizable to scenarios where domain shifts and scarcity of target-domain segmentation labels are present.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free