VFLAIR: A Research Library and Benchmark for Vertical Federated Learning (2310.09827v2)

Published 15 Oct 2023 in cs.LG

Abstract: Vertical Federated Learning (VFL) has emerged as a collaborative training paradigm that allows participants with different features of the same group of users to accomplish cooperative training without exposing their raw data or model parameters. VFL has gained significant attention for its research potential and real-world applications in recent years, but still faces substantial challenges, such as in defending various kinds of data inference and backdoor attacks. Moreover, most of existing VFL projects are industry-facing and not easily used for keeping track of the current research progress. To address this need, we present an extensible and lightweight VFL framework VFLAIR (available at https://github.com/FLAIR-THU/VFLAIR), which supports VFL training with a variety of models, datasets and protocols, along with standardized modules for comprehensive evaluations of attacks and defense strategies. We also benchmark 11 attacks and 8 defenses performance under different communication and model partition settings and draw concrete insights and recommendations on the choice of defense strategies for different practical VFL deployment scenarios.

Authors (6)

Tianyuan Zou (8 papers)
Zixuan Gu (2 papers)
Yu He (106 papers)
Hideaki Takahashi (19 papers)
Yang Liu (2256 papers)
Ya-Qin Zhang (45 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces VFLAIR, a unified library and benchmark framework that advances vertical federated learning through comprehensive attack and defense evaluations.
The paper shows that splitVFL configurations significantly enhance model resilience against data inference attacks while optimizing communication efficiency.
The paper proposes the Defense Capability Score (DCS) as a novel metric to balance attack mitigation with maintaining high main task performance.

Insightful Overview of VFLAIR: A Research Library and Benchmark for Vertical Federated Learning

The paper introduces VFLAIR, an innovative framework designed for advancing research in Vertical Federated Learning (VFL). Unlike conventional Federated Learning (FL) approaches that focus on Horizontal FL (HFL), VFL partitions data by features, allowing different organizations to collaboratively train models without sharing sensitive local data. VFL has been appealing due to its capability to preserve data privacy, which is crucial in domains like finance and advertising. However, VFL research has faced significant limitations, particularly in defending against data inference and backdoor attacks. These challenges are compounded by the absence of suitable benchmarks that allow tracking research progress in this domain. VFLAIR addresses these gaps, offering both a comprehensive library for model training and a benchmark for evaluating VFL's susceptibility to attacks and defenses.

Framework Features and Capabilities

VFLAIR is built as a lightweight and extensible framework that supports a wide variety of models, datasets, and communication protocols. It provides standardized modules for evaluating both attack and defense strategies in VFL systems. The framework facilitates the exploration of different VFL settings and model architectures (aggVFL and splitVFL), which traditionally have been challenging to benchmark due to lack of standardized environments. The authors of the paper implement $11$ types of attack strategies and $8$ different defenses, which underscores the realistic threat landscape VFL systems encounter. Furthermore, the benchmark supports $13$ datasets from diverse industrial applications, thus ensuring that the evaluation is as broad as it is deep. This includes datasets like MNIST, CIFAR, and NUSWIDE, which are frequently used for traditional machine learning benchmarks, allowing for comparability.

Evaluation and Results

In benchmark experiments, the paper evaluates the performance of VFL systems using metrics such as main task performance (MP), communication efficiency, and attack performance (AP). For instance, FedBCD and CELU-VFL protocols were found to improve communication efficiency significantly without impairing MP. However, compression mechanisms such as Quantize and Top-k could adversely affect communication rounds. Interestingly, findings suggest that splitVFL configurations, which employ trainable aggregation models, show increased robustness against data inference attacks compared to aggVFL setups. This insight is critical for designing secure VFL systems, as it suggests that choice of aggregation strategy could directly influence model resilience against attacks.

Defense Strategies: Insights from DCS Metrics

One of the paper's pivotal contributions is the Defense Capability Score (DCS), which measures the trade-off between attack performance reduction and main task utility preservation. MID (Mutual Information Regularization) emerged as an especially effective defense across a variety of attacks. This indicates the promise of information-theoretic approaches for defending against adversarial inference in VFL environments. The use of T-DCS (Type-level Defense Capability Score) and C-DCS (Comprehensive Defense Capability Score) further highlights the performance of specific defense strategies against types of attacks, providing practical guidelines for VFL implementations based on threat models relevant to particular datasets or applications.

Theoretical and Practical Implications

Practically, VFLAIR offers a unified, open-source benchmark for researchers, aiding them in developing more robust VFL systems. From a theoretical perspective, the introduction of novel evaluation metrics, such as the DCS, provides a more nuanced understanding of the trade-offs involved in deploying various privacy-preserving and secure learning techniques. This research framework sets the stage for future work that might explore more advanced federated learning frameworks, possibly integrating cryptography and non-cryptographic protocols to shield against emerging threats.

Speculations for Future Developments

Given the rapid evolution of AI and federated learning, VFLAIR could pave the way for more resilient and adaptive VFL paradigms. Future research can expand on VFLAIR by integrating emerging cryptographic techniques like homomorphic encryption, exploring adaptive federated protocols, and examining the intersection of VFL with domain-specific models tailored for privacy constraints. As federated learning continues to mature, frameworks like VFLAIR will be invaluable in ensuring that the models are not only efficacious but also ethically and legally compliant with global data protection standards.

PDF Markdown

Related Papers

GitHub

GitHub - FLAIR-THU/VFLAIR: THU-AIR Vertical Federated Learning general, extensible and light-weight framework (92 stars)

Tweets

https://twitter.com/_LindaLydia/status/1747505780079927395