Papers
Topics
Authors
Recent
Search
2000 character limit reached

CAPRISE: Conditional Distance Encryption

Updated 25 January 2026
  • The paper introduces CAPRISE, a symmetric encryption scheme that preserves conditional distance ordering for secure, efficient top‑k retrieval in outsourced settings.
  • CAPRISE uses separate encryption methods for queries and database embeddings with controlled randomization to protect sensitive geometric information.
  • Experimental evaluations show CAPRISE achieves up to 9× throughput improvements and strong differential privacy guarantees compared to partially homomorphic encryption approaches.

Conditional Approximate Distance-Comparison-Preserving Symmetric Encryption (CAPRISE) is a symmetric cryptographic scheme designed to allow efficient and privacy-preserving similarity search over embedding vectors, particularly in outsourced retrieval-augmented generation (RAG) frameworks. CAPRISE enables cloud providers to determine the relative similarity between an encrypted query and encrypted database embeddings—revealing only the required comparison ordering for top-k retrieval—while concealing all other geometric or pairwise information. It achieves this using two distinct encryption algorithms for queries and database items, employing controlled randomization to protect both content and structural relationships. CAPRISE can be composed with differentially private query perturbation, enabling strong theoretical confidentiality guarantees, high retrieval accuracy, and significantly reduced computational overhead compared to partially homomorphic encryption approaches (Ye et al., 18 Jan 2026).

1. Formal Construction of CAPRISE

CAPRISE operates over an embedding space MRd\mathcal{M} \subseteq \mathbb{R}^d, parameterized by a distance-gap β>0\beta > 0 and key space S\mathcal{S} for the scalar multiplier ss. It utilizes a pseudorandom function (PRF) FKF_K and randomness sources for vector-valued and scalar sampling. The encryption involves a separate procedure for database embedding vectors (EncDB\mathrm{Enc}_{\mathrm{DB}}) and for query embeddings (EncQ\mathrm{Enc}_{\mathrm{Q}}).

  • Key Generation:

KeyGen()\mathrm{KeyGen}(): Sample sSs \leftarrow \mathcal{S}, K{0,1}kK \leftarrow \{0,1\}^k. Return secret key β>0\beta > 00.

  • Database Embedding Encryption:

β>0\beta > 01 1. β>0\beta > 02 (random nonce). 2. β>0\beta > 03. 3. β>0\beta > 04 (d-dimensional standard Gaussian). 4. β>0\beta > 05 (seed=β>0\beta > 06). 5. β>0\beta > 07. 6. β>0\beta > 08. 7. Return β>0\beta > 09.

  • Query Embedding Encryption:

S\mathcal{S}0 Identical steps, except the offset is S\mathcal{S}1, and S\mathcal{S}2.

  • Database Decryption:

For completeness, decryption reconstructs S\mathcal{S}3 via S\mathcal{S}4 determinism from S\mathcal{S}5:

S\mathcal{S}6

This system ensures non-determinism (due to S\mathcal{S}7 and the randomized offsets), and separate randomization scales between database and query ciphertexts.

2. Conditional Distance-Comparison-Preserving Guarantees

CAPRISE preserves only the conditional ordering for query-to-database distances, crucial for top-S\mathcal{S}8 nearest neighbor retrieval. Let S\mathcal{S}9 be a query, ss0 be database vectors, and corresponding ciphertexts as previously defined.

  • If ss1, then the encrypted comparison ss2 will also hold.
  • However, for database–database distances, ss3 is not guaranteed to imply ss4.

Mathematically, the core property is:

ss5

but

ss6

This separation is enforced by bounding the noise terms ss7, with query offsets strictly smaller than database offsets. The design makes only the required query-to-database distance orderings accessible to the server, obfuscating all other spatial relationships.

3. Operation, Conditional Logic, and Server Behavior

The distinct noise scales in ss8 and ss9 render all inter-database distances unusable, while maintaining order reliability for sufficiently separated query-to-database distances (gap at least FKF_K0) amplified by the scaling parameter FKF_K1.

For top-FKF_K2 retrieval, the cloud server, given the encrypted query FKF_K3 and stored encrypted database FKF_K4, performs:

  • For each FKF_K5, compute FKF_K6.
  • Return indices corresponding to the FKF_K7 smallest FKF_K8.

The server never learns plaintext distances or the order among the encrypted database vectors themselves. This approach supports secure non-interactive ranking, with privacy bound tightly to the inability to relate database embeddings to one another.

4. Security and Privacy Analysis

The security analysis considers an honest-but-curious adversarial cloud. Embedding encryption security rests on the PRF; given FKF_K9, the noise is computationally indistinguishable from random, and reconstructing EncDB\mathrm{Enc}_{\mathrm{DB}}0 is as hard as the underlying PRF. CAPRISE's leakage function EncDB\mathrm{Enc}_{\mathrm{DB}}1 exposes only the top-EncDB\mathrm{Enc}_{\mathrm{DB}}2 comparison outcome:

  • Given EncDB\mathrm{Enc}_{\mathrm{DB}}3, the server learns only the outcome “is EncDB\mathrm{Enc}_{\mathrm{DB}}4 among the smallest EncDB\mathrm{Enc}_{\mathrm{DB}}5?”

Privacy Theorem (as stated):

Under standard PRF assumptions, CAPRISE is EncDB\mathrm{Enc}_{\mathrm{DB}}6-indistinguishable: only the relative order of encrypted query–database distances is exposed; absolute distances and all inter-database relations remain hidden.

To mitigate query analysis, CAPRISE composes with Differential Privacy via the DistanceDP mechanism: before query encryption, EncDB\mathrm{Enc}_{\mathrm{DB}}7 is perturbed by Gaussian noise calibrated for EncDB\mathrm{Enc}_{\mathrm{DB}}8-DP guarantees. The resultant query EncDB\mathrm{Enc}_{\mathrm{DB}}9, EncQ\mathrm{Enc}_{\mathrm{Q}}0, is then encrypted as above. As a result, even repeated queries do not expose sensitive user information.

This privacy composition extends to theoretical proofs showing (Theorem, as claimed) that the mechanism satisfies EncQ\mathrm{Enc}_{\mathrm{Q}}1-differential privacy over the embedding space.

5. Computational and Communication Characteristics

CAPRISE is designed for practical throughput and low resource requirements. For each vector, encryption involves EncQ\mathrm{Enc}_{\mathrm{Q}}2 cost for sampling EncQ\mathrm{Enc}_{\mathrm{Q}}3 and for linear transform and offset addition.

Empirical performance on an NVIDIA A100 GPU is:

  • CAPRISE: EncQ\mathrm{Enc}_{\mathrm{Q}}4 embeddings/sec (for EncQ\mathrm{Enc}_{\mathrm{Q}}5)
  • PHE-based RemoteRAG: EncQ\mathrm{Enc}_{\mathrm{Q}}6–EncQ\mathrm{Enc}_{\mathrm{Q}}7 embeddings/sec

This constitutes approximately a EncQ\mathrm{Enc}_{\mathrm{Q}}8 improvement in throughput relative to partially homomorphic encryption, with encryption overhead remaining below 19% of the underlying embedding computation.

Ciphertext size is near-minimal: each entry is the embedding plus a small random string (EncQ\mathrm{Enc}_{\mathrm{Q}}9, KeyGen()\mathrm{KeyGen}()0).

embedding size CAPRISE embeddings/sec
768 KeyGen()\mathrm{KeyGen}()12,339
1536 KeyGen()\mathrm{KeyGen}()21,800
3072 KeyGen()\mathrm{KeyGen}()31,200

Each database query and retrieval round thus scales efficiently for practical RAG deployments.

6. Differential Privacy Integration and Retrieval Adjustments

To enforce differential privacy for query protection—especially to defend against inference from repeated or correlated searches—DistanceDP perturbation is applied to each query embedding before encryption. The noise is Gaussian or von Mises–Fisher (KeyGen()\mathrm{KeyGen}()4), with KeyGen()\mathrm{KeyGen}()5 set according to the desired KeyGen()\mathrm{KeyGen}()6 guarantee.

There exists an explicit relationship between distortion and candidate expansion: to recover the actual top-KeyGen()\mathrm{KeyGen}()7 neighbors under angle distortion KeyGen()\mathrm{KeyGen}()8, the top-KeyGen()\mathrm{KeyGen}()9 search at the cloud must be expanded to top-sSs \leftarrow \mathcal{S}0, as formalized by:

sSs \leftarrow \mathcal{S}1

with sSs \leftarrow \mathcal{S}2 database size, sSs \leftarrow \mathcal{S}3 surface area on sSs \leftarrow \mathcal{S}4-sphere. Thus, increasing privacy via larger sSs \leftarrow \mathcal{S}5 entails a modest computational expansion.

Empirical results found that, despite adding DP noise, local re-ranking after top-sSs \leftarrow \mathcal{S}6 retrieval over the encrypted database ensures that the true top-sSs \leftarrow \mathcal{S}7 is recovered with high recall (sSs \leftarrow \mathcal{S}895%), and RAG quality is nearly indistinguishable from the plaintext baseline.

7. Experimental Evaluation and Application to Secure RAG

A detailed case study was performed using the MS MARCO passage-retrieval dataset, with gtr-t5-base embeddings (sSs \leftarrow \mathcal{S}9). Privacy resistance to vector inversion was measured by Vec2Text BLEU and F1 scores: plaintext BLEU K{0,1}kK \leftarrow \{0,1\}^k0, F1 K{0,1}kK \leftarrow \{0,1\}^k1; CAPRISE-encrypted BLEU K{0,1}kK \leftarrow \{0,1\}^k2, F1 K{0,1}kK \leftarrow \{0,1\}^k3. This indicates strong resistance to plaintext recovery.

Against database vector-analysis (attacker success rate, ASR), CAPRISE outperformed ADCPE for all embedding sizes tested, indicating superior robustness.

Query DP trade-offs were quantified, showing the proportional growth in K{0,1}kK \leftarrow \{0,1\}^k4 as a function of the noise setting and database size. Example table:

top-K{0,1}kK \leftarrow \{0,1\}^k5 K{0,1}kK \leftarrow \{0,1\}^k6 (K{0,1}kK \leftarrow \{0,1\}^k7) K{0,1}kK \leftarrow \{0,1\}^k8 K{0,1}kK \leftarrow \{0,1\}^k9 (β>0\beta > 000) β>0\beta > 001
5 258 52 58 12
10 571 57 108 11
20 928 46 203 10

End-to-end retrieval with CAPRISE and DP-perturbed queries reliably recovers the plaintext top-β>0\beta > 002 with recall β>0\beta > 003, combining strong privacy with practical RAG deployment (Ye et al., 18 Jan 2026).

In summary, CAPRISE provides a symmetric-key embedding encryption scheme for secure, efficient, and privacy-preserving retrieval in cloud-based settings, ensuring that only query-to-database ordinal information is ever revealed, and together with DistanceDP, supports provable privacy guarantees without significant computational or communication penalty.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Approximate Distance-Comparison-Preserving Symmetric Encryption (CAPRISE).