Privacy-Preserving Tree-Based Inference with TFHE (2303.01254v3)

Published 13 Feb 2023 in cs.CR, cs.AI, and cs.LG

Abstract: Privacy enhancing technologies (PETs) have been proposed as a way to protect the privacy of data while still allowing for data analysis. In this work, we focus on Fully Homomorphic Encryption (FHE), a powerful tool that allows for arbitrary computations to be performed on encrypted data. FHE has received lots of attention in the past few years and has reached realistic execution times and correctness. More precisely, we explain in this paper how we apply FHE to tree-based models and get state-of-the-art solutions over encrypted tabular data. We show that our method is applicable to a wide range of tree-based models, including decision trees, random forests, and gradient boosted trees, and has been implemented within the Concrete-ML library, which is open-source at https://github.com/zama-ai/concrete-ml. With a selected set of use-cases, we demonstrate that our FHE version is very close to the unprotected version in terms of accuracy.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a TFHE-based method that enables secure inference on decision trees, random forests, and gradient-boosted models by performing advanced homomorphic operations on encrypted data.
It leverages data quantization and TFHE’s programmable bootstrapping to maintain near-clear model accuracy with minimal precision loss at 5–6 bits.
Experimental results demonstrate competitive accuracy and practical performance for privacy-sensitive applications in sectors like healthcare and finance despite significant computational overhead.

Privacy-Preserving Tree-Based Inference with TFHE

Overview

The paper presents an implementation of privacy-preserving decision tree evaluation using the TFHE (Tor-based Fully Homomorphic Encryption) scheme. The proposed method focuses on secure inference using decision trees, random forests, and gradient-boosted trees, with encryption preserving the accuracy of predictions. By applying fully homomorphic encryption, the research addresses privacy concerns endemic to machine learning applications involving sensitive data.

Contribution and Methodology

The work leverages fully homomorphic encryption (FHE) to perform complex computations on ciphertexts, allowing untrusted servers to execute operations on private data without gaining knowledge of the original inputs. The method is restricted to secure inference, excluding secure training methods like differential privacy or federated learning. The technique is applicable across different tree-based models, utilizing encrypted integer representations and TFHE’s programmable bootstrapping mechanism.

The key contributions include:

Ciphertexts store multi-bit integers, parameterizing the cryptosystem to support correct accumulation.
Quantization of individual data features controls message space in ciphertexts, enabling efficient computation through programmable bootstrapping.
Crypto-system parameters are selected via optimization, aiming to maximize inference speed while ensuring accuracy.

The authors demonstrate their approach across popular datasets, with performance on encrypted data closely matching that on clear data, and latency competitive with state-of-the-art methods.

Results

The experiments highlight several findings:

Quantized models perform comparably to floating-point models in terms of accuracy, with minimal loss observed at 5-6 bits of precision.
Latency increases with quantization bit width, showcasing an exponential relationship beyond 4 bits.
Execution time ratios for FHE versus clear models range between 10,000x to 20,000x for ensemble methods, indicating substantial overhead due to encryption.
The experimental results demonstrate that the precision and probability of PBS error significantly affect latency.

In the experiments contrasting with prior work based on the BFV scheme, the TFHE-based approach achieves similar or reduced latency under certain configurations.

Implications and Future Work

The research has practical implications for deploying machine learning models in sensitive domains such as healthcare and finance, where data privacy is a paramount concern. The integration of FHE into tree-based models facilitates the preservation of privacy without significant compromises on accuracy, promoting wider adoption of data-driven algorithms in privacy-conscious environments.

Theoretical implications include advancements in privacy-preserving machine learning, particularly in optimizing cryptosystem parameters for improved performance. Future work may explore enhancements in TFHE implementation and broader application across diverse model architectures or secure training methodologies. Further investigation could also address the scalability of these methods to larger datasets and more complex models, refining the trade-off between accuracy, privacy, and computational efficiency.

Overall, this paper contributes to the ongoing discourse on privacy-preserving inference, illustrating the potential of FHE in enabling secure machine learning applications.

PDF Markdown

Related Papers

GitHub

GitHub - zama-ai/concrete-ml: Concrete ML: Privacy Preserving ML framework built on top of Concrete, with bindings to traditional ML frameworks. (822 stars)

Tweets

https://twitter.com/karardjoudj1/status/1916551575201267916