FedNorm: Modality-Based Normalization in Federated Learning for Multi-Modal Liver Segmentation (2205.11096v1)

Published 23 May 2022 in eess.IV, cs.CV, and cs.LG

Abstract: Given the high incidence and effective treatment options for liver diseases, they are of great socioeconomic importance. One of the most common methods for analyzing CT and MRI images for diagnosis and follow-up treatment is liver segmentation. Recent advances in deep learning have demonstrated encouraging results for automatic liver segmentation. Despite this, their success depends primarily on the availability of an annotated database, which is often not available because of privacy concerns. Federated Learning has been recently proposed as a solution to alleviate these challenges by training a shared global model on distributed clients without access to their local databases. Nevertheless, Federated Learning does not perform well when it is trained on a high degree of heterogeneity of image data due to multi-modal imaging, such as CT and MRI, and multiple scanner types. To this end, we propose Fednorm and its extension \fednormp, two Federated Learning algorithms that use a modality-based normalization technique. Specifically, Fednorm normalizes the features on a client-level, while Fednorm+ employs the modality information of single slices in the feature normalization. Our methods were validated using 428 patients from six publicly available databases and compared to state-of-the-art Federated Learning algorithms and baseline models in heterogeneous settings (multi-institutional, multi-modal data). The experimental results demonstrate that our methods show an overall acceptable performance, achieve Dice per patient scores up to 0.961, consistently outperform locally trained models, and are on par or slightly better than centralized models.

Citations (17)

View on Semantic Scholar

Summary

The paper introduces FedNorm and FedNorm+, two federated learning algorithms that employ modality-specific normalization to address heterogeneity in decentralized CT and MRI data.
It details a strategy where normalization and non-normalization parameters are aggregated differently, enabling effective training for both single-modality and mixed-modality clients.
Experimental results on diverse datasets show Dice scores up to 0.961 for CT and 0.941 for MRI, matching centralized performance while preserving data privacy.

Liver segmentation from medical images is a crucial task in diagnosing and treating liver diseases. While deep learning has shown success in this area, it heavily relies on large, annotated datasets. Obtaining such datasets centrally from multiple institutions is often hindered by privacy regulations like GDPR and the inherent heterogeneity of medical data (different modalities like CT and MRI, various scanners, protocols, and resolutions). Federated Learning (FL) offers a privacy-preserving approach by training a global model on decentralized data, but standard FL algorithms often struggle with this kind of data heterogeneity, particularly when dealing with multi-modal data.

The paper "FedNorm: Modality-Based Normalization in Federated Learning for Multi-Modal Liver Segmentation" (2205.11096) addresses this challenge by proposing two novel FL algorithms, FedNorm and FedNorm+, specifically designed for multi-modal liver segmentation on distributed CT and MRI data. The core idea is to leverage Modality Normalization (MN) (1810.05466), an extension of Batch Normalization (BN) (1502.03167) capable of handling data with multiple latent modes.

Problem Addressed:

The primary problem is training a single, high-performing deep learning model for liver segmentation using decentralized CT and MRI data from multiple institutions without pooling the data. Key challenges include:

Data Privacy: Medical data cannot be shared or centralized easily.
Multi-Modality: Significant differences in appearance and intensity distributions between CT and MRI scans (domain shift).
Inter-Client Heterogeneity: Different clients may have different modalities (CT only, MRI only, or both) and varying data sizes and acquisition parameters.

Proposed Solution: FedNorm and FedNorm+

The paper proposes two FL algorithms built on Mode Normalization (MN) to handle the multi-modal heterogeneity:

FedNorm:

This algorithm separates model parameters into normalization parameters (from MN layers) and non-normalization parameters (weights, biases, etc.).
It maintains separate sets of normalization parameters for each modality (CT and MRI) on the server.
When a client participates, the server sends the global non-normalization parameters and the modality-specific normalization parameters corresponding to the client's data modality (the server needs to know if a client has only CT or only MRI).
Local training occurs on the client's data.
During aggregation, non-normalization parameters are averaged across selected clients similar to FedAvg (1602.05629).
Normalization parameters for each modality are aggregated separately, incorporating an interpolation mechanism ( $\beta$ ) between the old server parameters and the newly averaged client parameters, inspired by momentum.
Limitation: FedNorm is designed for clients holding only a single modality (either CT or MRI exclusively). It cannot handle clients with data from both modalities.

The server-side aggregation for FedNorm can be summarized as:

// Aggregate non-normalization parameters
theta_t+1 = sum(N_k / N * theta_t+1^(k)) for k in S_t

// Aggregate normalization parameters separately for each modality
for modality in {CT, MRI}:
    theta_norm_t+1^(modality) = (1 - beta) * theta_norm_t^(modality) + beta * sum(N_k / N * theta_norm_t+1,k^(modality)) for k in S_t with modality

FedNorm+:

This is an extension of FedNorm designed to handle clients with data from mixed modalities (both CT and MRI).
It uses a single set of MN parameters for all clients.
MN is configured with a fixed number of modes, specifically $M=2$ , explicitly designated for CT and MRI, respectively.
During local training, the modality information of each input slice (CT or MRI) is used to hard-code which of the two MN modes is applied.
Aggregation averages all model parameters (both non-normalization and MN parameters/statistics) similar to FedAvg.
An interpolation mechanism ( $\beta$ ) is applied to all parameters during aggregation to stabilize training and avoid oscillations.
This approach allows clients with mixed CT+MRI data to train on both modalities simultaneously and contribute to the global model.

The local MN operation in FedNorm+ for an input $\boldsymbol{x}_n$ belonging to a batch $b$ :

// In MN layer, if modality(x_n) == CT:
Mode 1 (CT) parameters (mean, variance, gamma, beta) are used.
// If modality(x_n) == MRI:
Mode 2 (MRI) parameters (mean, variance, gamma, beta) are used.

The server-side aggregation for FedNorm+ can be summarized as:

1 2	// Aggregate all parameters theta_t+1 = (1 - beta) * theta_t + beta * sum(N_k / N * theta_t+1^(k)) for k in S_t

Note the interpolation is applied to all parameters in FedNorm+, unlike FedNorm where it's only for normalization parameters.

Implementation Details:

A modified, shallower U-Net (1512.03385) architecture is used, reducing complexity to around 136K parameters. This helps keep the model size manageable for FL.
Normalization layers (BN for baselines, MN for FedNorm/FedNorm+) are inserted after each convolution and before the ReLU activation.
Input images (2D slices) are resized to 256x256 pixels.
Intensity values are normalized per slice based on its mean and standard deviation and then clipped to $[-3, 3]$ .
The training objective is a combined loss function: Dice Loss (calculated using non-binary pseudo-probabilities) (1606.04797) + Binary Cross-Entropy Loss [Bishop].
Training uses the Adam optimizer (1412.6980).
FL experiments involve 100 communication rounds, with 2 clients participating per round, each performing 1 local epoch of training.

Experimental Validation:

The methods were validated on a large, diverse dataset combining six publicly available datasets (LiTS17, 3D-IRCADb, Multi-Atlas, SLIVER07, CHAOS19) and an in-house dataset (KORA), totaling 428 patients with both CT and MRI scans.

Two FL settings were designed to simulate different levels of heterogeneity:

Non-IID 1: Clients strictly have data from a single modality (3 CT-only clients, 3 MRI-only clients). This setting is suitable for evaluating FedNorm.
Non-IID 2: Includes clients with data from both CT and MRI modalities (2 CT-only, 2 MRI-only, 2 mixed CT+MRI clients). This setting requires methods like FedNorm+ that can handle mixed-modality clients.

The proposed methods were compared against local models (trained only on a single client's data), a centralized model (trained on pooled data from all clients), and state-of-the-art FL algorithms (FedAvg (1602.05629), FedAvgM (1909.06335), FedVC (2003.08082), SiloBN (2008.07424), FedBN (2102.07623)). Performance was measured using the Dice coefficient per patient on the test sets of each client.

Results and Findings:

FedNorm and FedNorm+ consistently outperformed local models across various clients and modalities, demonstrating the benefit of federated training.
The performance of FedNorm and FedNorm+ was generally on par with or slightly better than the centralized model, highlighting that data pooling is not necessary to achieve high performance for this task.
FedNorm and FedNorm+ achieved Dice scores up to 0.961 for CT and 0.941 for MRI.
FedNorm+ showed particularly strong results in the more challenging Non-IID 2 setting with mixed-modality clients, notably outperforming other methods (including the centralized model and other FL algorithms) on certain challenging MRI datasets.
Visual results confirmed that FedNorm+ produced accurate segmentations even on image appearances where local models or other FL methods struggled.

Practical Implications:

The FedNorm and FedNorm+ algorithms provide a practical approach to train accurate deep learning models for multi-modal medical image segmentation in a decentralized, privacy-preserving manner.
FedNorm+ is more applicable to real-world hospital networks where clients might have mixed datasets from different modalities.
The use of MN effectively handles the domain shift caused by different imaging modalities and scanners within the federated network.
The ability to achieve competitive performance with a relatively small U-Net suggests that complex models might not be strictly necessary, potentially reducing computational and communication overhead in FL.
The global model learned by these methods can be directly used for inference on new, unseen data without needing local adaptation steps, which is a practical advantage over methods like FedBN or SiloBN that keep normalization statistics local.

In conclusion, the paper successfully demonstrates the feasibility and effectiveness of federated learning for multi-modal liver segmentation by introducing modality-based normalization techniques. FedNorm+, in particular, provides a robust solution for realistic scenarios involving clients with diverse and mixed-modality data.

PDF Markdown

FedNorm: Modality-Based Normalization in Federated Learning for Multi-Modal Liver Segmentation (2205.11096v1)

Summary

Related Papers