- The paper introduces a TFHE-based method that enables secure inference on decision trees, random forests, and gradient-boosted models by performing advanced homomorphic operations on encrypted data.
- It leverages data quantization and TFHE’s programmable bootstrapping to maintain near-clear model accuracy with minimal precision loss at 5–6 bits.
- Experimental results demonstrate competitive accuracy and practical performance for privacy-sensitive applications in sectors like healthcare and finance despite significant computational overhead.
Privacy-Preserving Tree-Based Inference with TFHE
Overview
The paper presents an implementation of privacy-preserving decision tree evaluation using the TFHE (Tor-based Fully Homomorphic Encryption) scheme. The proposed method focuses on secure inference using decision trees, random forests, and gradient-boosted trees, with encryption preserving the accuracy of predictions. By applying fully homomorphic encryption, the research addresses privacy concerns endemic to machine learning applications involving sensitive data.
Contribution and Methodology
The work leverages fully homomorphic encryption (FHE) to perform complex computations on ciphertexts, allowing untrusted servers to execute operations on private data without gaining knowledge of the original inputs. The method is restricted to secure inference, excluding secure training methods like differential privacy or federated learning. The technique is applicable across different tree-based models, utilizing encrypted integer representations and TFHE’s programmable bootstrapping mechanism.
The key contributions include:
- Ciphertexts store multi-bit integers, parameterizing the cryptosystem to support correct accumulation.
- Quantization of individual data features controls message space in ciphertexts, enabling efficient computation through programmable bootstrapping.
- Crypto-system parameters are selected via optimization, aiming to maximize inference speed while ensuring accuracy.
The authors demonstrate their approach across popular datasets, with performance on encrypted data closely matching that on clear data, and latency competitive with state-of-the-art methods.
Results
The experiments highlight several findings:
- Quantized models perform comparably to floating-point models in terms of accuracy, with minimal loss observed at 5-6 bits of precision.
- Latency increases with quantization bit width, showcasing an exponential relationship beyond 4 bits.
- Execution time ratios for FHE versus clear models range between 10,000x to 20,000x for ensemble methods, indicating substantial overhead due to encryption.
- The experimental results demonstrate that the precision and probability of PBS error significantly affect latency.
In the experiments contrasting with prior work based on the BFV scheme, the TFHE-based approach achieves similar or reduced latency under certain configurations.
Implications and Future Work
The research has practical implications for deploying machine learning models in sensitive domains such as healthcare and finance, where data privacy is a paramount concern. The integration of FHE into tree-based models facilitates the preservation of privacy without significant compromises on accuracy, promoting wider adoption of data-driven algorithms in privacy-conscious environments.
Theoretical implications include advancements in privacy-preserving machine learning, particularly in optimizing cryptosystem parameters for improved performance. Future work may explore enhancements in TFHE implementation and broader application across diverse model architectures or secure training methodologies. Further investigation could also address the scalability of these methods to larger datasets and more complex models, refining the trade-off between accuracy, privacy, and computational efficiency.
Overall, this paper contributes to the ongoing discourse on privacy-preserving inference, illustrating the potential of FHE in enabling secure machine learning applications.