Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption (1711.10677v1)

Published 29 Nov 2017 in cs.LG

Abstract: Consider two data providers, each maintaining private records of different feature sets about common entities. They aim to learn a linear model jointly in a federated setting, namely, data is local and a shared model is trained from locally computed updates. In contrast with most work on distributed learning, in this scenario (i) data is split vertically, i.e. by features, (ii) only one data provider knows the target variable and (iii) entities are not linked across the data providers. Hence, to the challenge of private learning, we add the potentially negative consequences of mistakes in entity resolution. Our contribution is twofold. First, we describe a three-party end-to-end solution in two phases ---privacy-preserving entity resolution and federated logistic regression over messages encrypted with an additively homomorphic scheme---, secure against a honest-but-curious adversary. The system allows learning without either exposing data in the clear or sharing which entities the data providers have in common. Our implementation is as accurate as a naive non-private solution that brings all data in one place, and scales to problems with millions of entities with hundreds of features. Second, we provide what is to our knowledge the first formal analysis of the impact of entity resolution's mistakes on learning, with results on how optimal classifiers, empirical losses, margins and generalisation abilities are affected. Our results bring a clear and strong support for federated learning: under reasonable assumptions on the number and magnitude of entity resolution's mistakes, it can be extremely beneficial to carry out federated learning in the setting where each peer's data provides a significant uplift to the other.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Stephen Hardy (6 papers)
  2. Wilko Henecka (3 papers)
  3. Hamish Ivey-Law (4 papers)
  4. Richard Nock (72 papers)
  5. Giorgio Patrini (12 papers)
  6. Guillaume Smith (2 papers)
  7. Brian Thorne (2 papers)
Citations (506)

Summary

  • The paper introduces a secure protocol for federated logistic regression, using additively homomorphic encryption and privacy-preserving entity resolution.
  • The research analyzes the impact of entity resolution errors on classifier performance, empirical loss, margins, and generalization, demonstrating robustness under realistic assumptions.
  • Numerical results show that the protocol scales effectively to millions of entities and hundreds of features, matching the accuracy of centralized, non-private methods.

Private Federated Learning on Vertically Partitioned Data

The paper discusses a method for private federated learning where data is vertically partitioned across multiple parties. The focus is on learning a linear model without compromising the privacy of the data involved. This scenario presents unique challenges due to three main constraints: data is split by features, only one data provider has access to the target variable, and entities are not linked across providers.

Key Contributions

  1. End-to-End Secure Protocol: The authors propose a two-phase protocol for secure federated logistic regression. This protocol addresses privacy-preserving entity resolution and leverages an additively homomorphic encryption scheme. The process is resilient against honest-but-curious adversaries and maintains the confidentiality of data, even when linking entities between different datasets.
  2. Impact of Entity Resolution Errors: The paper offers a formal analysis of how errors in entity resolution affect learning. It explores the consequences on optimal classifiers, empirical losses, margins, and generalization capabilities. The analysis supports the use of federated learning, as it demonstrates that learning can still be advantageous despite entity resolution errors, under reasonable assumptions.

Numerical Results and Theoretical Implications

The implementation was tested for scalability and accuracy. The protocol achieved results comparable to non-private, centralized solutions, handling datasets with millions of entities and hundreds of features effectively. Theoretical insights suggest that federated learning remains robust in the presence of entity resolution errors, provided these errors are small in number. The bounds on classifier deviation and generalization error show potential for broader application.

Future Directions and Impact

This research paves the way for more secure collaborative modeling without sacrificing predictive power. The analysis of entity resolution errors opens possibilities for optimizing these methods specifically for learning tasks. Future developments could focus on enhancing the efficiency of the protocol and expanding its applicability to other models and learning contexts.

Conclusion

The paper successfully presents a structured approach to private federated learning on vertically partitioned data. It balances security with computational efficiency and provides a compelling case for federated systems where privacy-preserving data collaboration yields substantial benefits. This work advances both practical implementations and theoretical understanding in the field of secure, distributed machine learning.

X Twitter Logo Streamline Icon: https://streamlinehq.com