Privacy-Preserving Imputation via Federated Learning

Updated 9 November 2025

The paper introduces privacy-preserving imputation by leveraging federated learning to collaboratively train models without exposing raw data.
It details techniques such as feature-level translation, zero-knowledge proofs, and secure aggregation to address multimodal, temporal, and adversarial challenges.
Empirical results across healthcare, energy, and sensor networks demonstrate improved imputation accuracy, lower communication overhead, and robust privacy protection.

Privacy-preserving imputation via federated learning encompasses a family of methodologies for addressing missing data in distributed cohorts while minimizing raw data exposure. In federated learning (FL), multiple clients (often institutions or edge devices with heterogeneous datasets) jointly train imputation or downstream predictive models under the orchestration of a server, sharing only model parameters, feature representations, aggregated statistics, or encrypted data. The primary objective is dual: maximizing imputation and prediction performance while applying rigorous measures to prevent privacy leakage, even under adversarial or untrusted conditions.

1. Federated Imputation: Problem Setting and Motivation

Missing data is endemic in federated data ecosystems, particularly in healthcare, energy, and sensor networks. Factors such as variable acquisition protocols, cost/access disparities, retrospective cohorts, sensor faults, and privacy restrictions frequently result in some modalities, channels, or timepoints being absent for subsets of clients. Naïve imputation methods such as zero-filling or mean-imputation are suboptimal and may substantially degrade downstream task performance.

The federated learning paradigm enables clients to keep their raw data local, updating global model parameters collaboratively. This arrangement accommodates privacy and legal constraints, but necessitates novel methodologies for handling missingness. Three prominent scenarios are addressed:

Multimodal federated imputation: Clients may have access to different subsets of modalities (e.g., image, text, genomics).
Temporal imputation across distributed time series: Clients observe incomplete, irregularly sampled sequences.
Fully distributed imputation with untrusted or malicious participants: Adversaries may attempt to reconstruct local data or corrupt model updates.

The common thread in these methodologies is the design of imputation mechanisms that avoid direct data sharing, leveraging compressed representations, statistical summaries, or cryptographically protected parameters.

2. Feature-Based Imputation in Multimodal Federated Learning

In the "Multimodal Federated Learning With Missing Modalities through Feature Imputation Network" (FIN) approach (Poudel et al., 26 May 2025), clients are modeled as holding private tuples $(X_I, X_T, Y)$ , denoting image, text, and label respectively. The global model is orchestrated as the composition of two modality-specific encoders $f_e^I$ , $f_e^T$ , a fusion operator (concatenation), and a classifier head $f_c$ :

$\hat{y} = f_c(f_e^I(X_I) \oplus f_e^T(X_T)).$

To address missing modalities, FIN introduces feature-level translators:

$\Phi_T: \mathbb{R}^k \rightarrow \mathbb{R}^k\quad (\text{image} \rightarrow \text{text}),\ \Phi_I: \mathbb{R}^k \rightarrow \mathbb{R}^k\quad (\text{text} \rightarrow \text{image}),$

built as lightweight Transformer decoder stacks (6 layers, 4 attention heads, 1024-dimensional hidden size, $k=256$ ). Imputation is performed at the encoder bottleneck (latent) level, where $\Phi_T$ or $\Phi_I$ reconstruct the missing modality's features for unimodal samples.

Training optimizes a compound objective:

$\mathcal{L}_c = \mathcal{L}_\mathrm{task} + \lambda \mathcal{L}_\mathrm{rec},$

where $\mathcal{L}_\mathrm{rec}$ is the mean-squared error in feature space for multimodal clients, and $\mathcal{L}_\mathrm{task}$ is a classification cross-entropy. Federated averaging (FedAvg) aggregates parameter updates.

Empirical results on MIMIC-CXR, NIH Open-I, and CheXpert datasets demonstrate that FIN achieves macro-AUC of 86.2% (homogeneous) and 77.9% (heterogeneous) for unimodal image clients, outperforming zero/uniform filling and federated generative report models. Notably, the feature translator method approaches the performance of public-data-based cross-modal augmentation without requiring access to real or synthetic external data. FIN's low-dimensional bottleneck representations also reduce computational and communication overhead by an order of magnitude relative to input-level generative models.

FIN requires availability of a minority of multimodal clients in each round for effective translator supervision and currently lacks formal differentially private mechanisms or cryptographic protocols. The method is extensible to additional modalities and alternate feature translators, with plausible application beyond medical imaging into domains such as sensor networks or recommender systems.

3. Secure Imputation with Verifiable Privacy and Trust-Aware Aggregation

In industrial and energy-sector FL, the ZTFed-MAS2S framework (Li et al., 24 Aug 2025) addresses missing wind power data using a multi-headed attention-based sequence-to-sequence (MAS2S) model. ZTFed-MAS2S is distinguished by its zero-trust architecture, combining verifiable differential privacy (DP) via non-interactive zero-knowledge proofs (NIZK), and dynamic trust-aware aggregation (DTAA).

The system enforces differential privacy by clipping client model parameters and adding Gaussian noise calibrated by $(\epsilon, \delta)$ :

$\bar\theta_i = \theta_i / \max(1, \|\theta_i\|_2/\tau_c),\ \widetilde\theta_i = \bar\theta_i + \mathbf{n}_i,\,\,\, \mathbf{n}_i \sim \mathcal{N}(0, \sigma^2 I),$

where $\sigma$ is governed by the Gaussian mechanism. Each client additionally generates a Schnorr-style NIZK to prove correct noise addition.

DTAA calculates pairwise cosine similarities between perturbed updates to construct a trust graph, propagating trust scores and filtering out anomalous clients via median absolute deviation. Final global model aggregation is then conducted over the trusted set.

The MAS2S imputation model itself is a BiLSTM encoder–decoder with multi-head attention, optimized locally via Adam for mean absolute error between true and reconstructed sequences.

Communication overhead is addressed with sparsity-driven and quantization-based compression, plus AES-CBC encryption and HMAC for confidentiality and integrity.

Empirical validation on the NREL wind farm dataset (sequence length $T=96$ , up to 90\% missing data) shows that ZTFed-MAS2S achieves RMSE as low as 0.0411 at extreme missingness, outperforming baselines by substantial margins and maintaining robustness under adversarial sign-flipping of updates. The DP-NIZK+CIV machinery enables verifiable privacy preservation while reducing communication costs by more than 50% against FHE/TSS. Tradeoff curves reveal that at $\epsilon=40, \delta=10^{-4}$ , the method achieves a membership inference attack success rate of 59.2% with 82.4% utility.

A notable significance is the integration of verifiability and trust scoring, which is critical in open, zero-trust industrial environments. ZTFed-MAS2S is effective for privacy-preserving imputation in practical, large-scale, and adversarial settings where traditional federated protocols are susceptible to privacy attacks and untrustworthy aggregation.

4. Markovian Temporal Imputation via Federated Aggregation

For time-series data with irregular sampling, as in multi-centric ICU environments, the Federated Markov Imputation (FMI) strategy (Düsing et al., 25 Sep 2025) leverages global transition models constructed via secure aggregation of local Markov transition statistics.

Concretely, each scalar feature is discretized into $n$ bins; each ICU $c$ computes local transition counts $C_c(i,j)$ between bins over observed data. Through secure aggregation, the server constructs global aggregated counts $C_\mathrm{fed}(i,j)$ and normalizes to obtain transition probabilities $T_\mathrm{fed}(i,j)$ . Missing bins are imputed via Markov inference, either using maximum-likelihood single-step completion:

$\hat b_t = \arg\max_{j} T_\mathrm{fed}(b_{t-1},j) T_\mathrm{fed}(j, b_{t+1}),$

or using Viterbi-style dynamic programming for contiguous missing segments. The process is entirely privacy-preserving, since no raw time-series ever leave the clients.

In a two-phase protocol, this imputation is followed by downstream outcome prediction via FL using a 3-layer LSTM + MLP fed on imputed data, with FedAvg aggregation.

Evaluation on the MIMIC-IV ICU dataset, under both regular and irregular temporal sampling, reveals that FMI attains AUC of 0.8878 in regular settings and 0.8629 in irregular settings, outperforming local mean- and local Markov-based imputation by 0.02–0.06 AUC. In irregular sampling regimes, FMI is operational while local Markov imputation fails on coarser grids. Transition sharing via secure aggregation enhances temporal modeling, but FMI's current limitations include its first-order Markov assumption, discretization’s loss of fine-grained clinical variation, and the current absence of formal differential privacy on transition matrices.

5. Loss Functions, Training Algorithms, and Optimization Objectives

All frameworks described above are grounded in composite loss objectives, govern both imputation fidelity and downstream task objectives.

Feature-based approaches (e.g., FIN): optimize mean-squared reconstruction error over paired bottleneck features and cross-entropy loss for task prediction. At unimodal clients, only task loss is active; multimodal clients train both translation and prediction jointly.
Sequence-to-sequence models (e.g., MAS2S): minimize time-averaged mean absolute error across full imputation outputs; local client optimization uses Adam, with scheduled rounds for uploading perturbed parameters for federated aggregation.
Markovian approaches (e.g., FMI): imputation uses maximum-likelihood inference based on global transition probabilities, with privacy arising from secure aggregation; downstream tasks use standard predictive loss functions (e.g., cross-entropy for LSTM classifiers).

All methods employ the FedAvg aggregation protocol unless replaced by more robust variants (e.g., DTAA).

6. Privacy Mechanisms and Security Considerations

Privacy preservation is fundamental to these federated imputation systems. Comparative privacy mechanisms are summarized below:

Method	Main Privacy Mechanism	Exposure Mitigated
FIN (Feature)	Bottleneck feature sharing only	Raw data, full features
ZTFed-MAS2S	DP + NIZK + encryption/DTAA	Raw data, parameter inversion, manipulation
FMI (Markov)	Secure aggregation on counts	Raw time series, local stat leaks

FIN's approach inherently reduces information leakage via low-dimensional shared features, but does not currently employ formal cryptographic or DP methods. ZTFed-MAS2S augments DP with stringent verifiability (NIZK) and robust aggregation to ensure that no trusted party is required. FMI deploys secure aggregation for Markov counts; formal DP on summary statistics is suggested as a future enhancement.

Limitations include situations with zero multimodal clients (FIN) or the absence of formal DP on summary statistics (FMI). For all methods, adversarial model inversion attacks remain a research concern, with DP, secure aggregation, or encryption as candidate mitigations.

7. Comparative Performance and Generality

All presented methods report strong task- and domain-specific empirical improvements compared to local or naïve imputation. High-level benchmarks in their respective evaluations are:

FIN: Macro-AUC for unimodal clients (hetero setting) = 77.9% vs. 67.3% (R2Gen), 72.8% (zero-filling).
ZTFed-MAS2S: At 90% missing, RMSE = 0.0411 vs. next-best 0.0712; up to 28.2% RMSE reduction under adversarial updates with DTAA vs. alternative aggregates.
FMI: AUC = 0.8629 (irregular sampling) vs. 0.7961 (local mean).

These approaches are broadly extensible. FIN may generalize to multimodal learning tasks outside healthcare. ZTFed-MAS2S’s trust/privacy architecture suggests applicability to other critical-infrastructure FL settings. FMI’s secure Markov-chain aggregation is lightweight and conceptually applicable to other time series domains with similar privacy constraints and temporal heterogeneity.

Plausible implications are that future work will integrate additional modalities, cryptographic protocols, or formal DP, and pursue joint end-to-end imputation and predictive learning—potentially increasing both utility and privacy guarantees.

PDF Markdown Chat (Pro)

References (3)

Multimodal Federated Learning With Missing Modalities through Feature Imputation Network (2025)

ZTFed-MAS2S: A Zero-Trust Federated Learning Framework with Verifiable Privacy and Trust-Aware Aggregation for Wind Power Data Imputation (2025)

Federated Markov Imputation: Privacy-Preserving Temporal Imputation in Multi-Centric ICU Environments (2025)

Follow Topic

Get notified by email when new papers are published related to Privacy-Preserving Imputation via Federated Learning.