L2FE-Hash: Secure Fuzzy Extractor for ML Embeddings
- L2FE-Hash is a cryptographic fuzzy extractor that secures high-dimensional machine learning embeddings by tolerating Euclidean variations during authentication.
- It integrates lattice-based error correction and LWE cryptography to provide provable security even under full-leakage scenarios.
- Empirical evaluations demonstrate robust resistance to model inversion attacks while maintaining practical authentication performance in face recognition systems.
L2FE-Hash is a cryptographic fuzzy extractor construction designed to protect ML embeddings—particularly those derived from face recognition systems—against model inversion attacks, while preserving authentication utility under Euclidean () distance. Unlike prior schemes, L2FE-Hash provides provable computational security even in the full-leakage threat model, ensuring that adversaries cannot invert embeddings to recover biometric data even after full compromise of server-side secrets. The construction combines concepts from learning-with-errors (LWE) cryptography, lattice-based error correction, and seeded randomness extractors, and is the first instance to support high-dimensional distance comparators in practical ML-authentication applications (Prabhakar et al., 29 Oct 2025).
1. Background and Motivation
Fuzzy extractors enable high-entropy key extraction from noisy data, allowing reliable reproduction if an input is sufficiently "close" to under a well-defined metric. Early fuzzy extractors targeted discrete biometric modalities using Hamming distance; however, ML-based face authentication generates real-valued, high-dimensional vectors () for which Euclidean () closeness is the relevant criterion. Standard cryptographic hash functions, which do not tolerate input perturbations, are unsuitable for such settings.
Existing post-processing defenses such as Facial-FE and Multispace Random Projection (MRP) have been shown to be vulnerable to adaptive model inversion attacks, including a new attack—PIPE—which attains attack success rates (ASRs) exceeding 89%. These vulnerabilities persist even under the assumption of a full server breach, motivating the need for provably secure, metric-tolerant primitives specifically designed for metrics and high-dimensional ML embeddings (Prabhakar et al., 29 Oct 2025).
2. Security Framework and Definitions
Let denote the space of ML embeddings and the Euclidean distance function. A (X,X,Y,t,λ)-ideal primitive consists of two algorithms (Gen, Rep) satisfying three principal security goals:
- Noise tolerance (correctness): For all with , where .
- Fuzzy one-wayness (privacy): No PPT adversary, given , can find with with more than negligible probability.
- Utility (entropy sufficiency): The output has high HILL computational entropy conditioned on .
These criteria formalize the requirements for a secure, practically useful extractor in the face authentication domain, particularly under full-leakage scenarios (Prabhakar et al., 29 Oct 2025).
3. L2FE-Hash Construction
L2FE-Hash utilizes a -ary lattice embedding and leverages the property that the public data , with and chosen uniformly at random, forms an LWE instance where the error term is drawn from bounded-support ML embeddings:
- Enrollment (Gen):
- Quantize to .
- Sample , .
- Compute .
- Sample cryptographic hash key .
- Set helper and secret .
- Authentication (Rep):
- Given , :
- Compute (mod ).
- Babai's nearest-plane decoding on using to recover .
- Output .
Critical parameters include modulus (large prime), lattice dimension (controls security), and basis (randomly sampled). Decoding is guaranteed for input perturbations of radius up to by the geometry of the lattice (Prabhakar et al., 29 Oct 2025).
4. Security Analysis
Correctness
With random and for embedding noise satisfying , Babai's algorithm returns the correct with probability at least . This ensures authentication correctness for practical noise levels common in face embeddings.
Fuzzy One-Wayness
Given , the problem of recovering or any close to is reduced to the LWE problem with bounded error. Under standard LWE hardness, even given complete leakage of helper data, and remain hidden. Furthermore, acts as a seeded randomness extractor, ensuring that the derived key is unpredictable even if the adversary knows all public parameters (Prabhakar et al., 29 Oct 2025).
Formal Guarantees
- Theorem 1 (Informal): Under the LWE assumption and extractor strength, L2FE-Hash is an -fuzzy extractor.
- Theorem 2: Any fuzzy extractor satisfying these properties yields an ideal primitive with adversarial advantage at most .
5. Empirical Evaluation and Comparative Results
Experiments were conducted using the CelebA, LFW, and CASIA-Webface datasets; FaceNet and ArcFace embedding models; and a genuine threshold set for TPR (FaceNet) or (ArcFace) at FPR .
Authentication Performance
Babai decoding, applied after L2FE-Hash, yields:
| Rep=“match” | Rep=“no” | |
|---|---|---|
| Same | 65 | 35 |
| Diff | 4 | 96 |
This corresponds to a TPR of 65% and FPR of 4% using single samples. Applying majority voting over samples increases TPR to at least 95% (Prabhakar et al., 29 Oct 2025).
Inversion Resistance
Cross-dataset attack success rates (ASR) for PIPE, Bob, GMI, and KED-MI attacks under full leakage are at or below those for random guessing. For FaceNet, ASRs are , and for ArcFace, ASRs are between $1.7$–, with all values within one standard deviation of the random baseline (–) (Prabhakar et al., 29 Oct 2025):
| Dataset | Model | PIPE | Bob | Random |
|---|---|---|---|---|
| CelebA | FaceNet | 0.6 | 0.6 | |
| LFW | FaceNet | 0.5 | 0.5 | |
| CASIA | FaceNet | 0.4 | 0.4 | |
| CelebA | ArcFace | 7.0 | 7.0 | |
| LFW | ArcFace | 1.7 | 1.7 | |
| CASIA | ArcFace | 3.3 | 3.3 |
Reconstructed images by PIPE are perceptually dissimilar (LPIPS ) and exhibit high FID ( vs $105$ for Bob), indicating that L2FE-Hash provides meaningful privacy even against adaptive attacks (Prabhakar et al., 29 Oct 2025).
6. Design Considerations and Deployment
L2FE-Hash is strictly a post-processing primitive—no re-training of embedding models is required. Enrollment (Gen) is run once per user; reproduction (Rep) must support real-time authentication, with Babai’s nearest-plane decoding offering polynomial-time complexity.
Several implementation trade-offs are highlighted:
- Lattice parameters: Dimension and modulus control both the security guarantees and decoding complexity. must promote lattice “goodness” with high probability.
- Quantization: Embeddings must be quantized to , necessitating an accuracy–robustness trade-off.
- Hardware: SIMD or GPU acceleration is practical for matrix operations.
This suggests that L2FE-Hash is suitable for immediate deployment in systems where post-processing wrappers are preferred over retraining or architectural changes (Prabhakar et al., 29 Oct 2025).
7. Extensions and Broader Relevance
The L2FE-Hash paradigm is generalizable to other biometric modalities producing real-valued embeddings under metrics, such as voice or fingerprint minutiae. Potential directions include integrating the construction with device-resident key management for multi-factor authentication, exploring alternative metrics (cosine, ) by appropriate lattice or secure sketch design, and combining with hardware security modules to strengthen confidentiality and integrity guarantees.
This construction establishes a new baseline for privacy-preserving authentication in ML-driven biometric systems, rigorously addressing the limitations of previous -tolerant extractors and setting the foundation for future research in robust, attack-agnostic biometric security (Prabhakar et al., 29 Oct 2025).