Scalable Ciphertext Data-Packing Scheme
- Scalable ciphertext data-packing is a method that encodes numerous plaintext elements into a single ciphertext, facilitating parallel homomorphic operations.
- It leverages diverse strategies—such as row-order, block-oriented, and bit-interleaved packing—tailored to algebraic structures like RLWE and LWE.
- This approach reduces computational overhead and communication costs in applications like FHE-based deep learning, privacy-preserving machine learning, and post-quantum key establishment.
A scalable ciphertext data-packing scheme is a cryptographic method for multiplexing multiple plaintext elements—such as values, vectors, signals, or messages—into a single ciphertext, with the goals of maximizing parallelism, minimizing ciphertext expansion, and enabling efficient homomorphic operations under computational or physical resource constraints. Such schemes are pivotal for applications in fully homomorphic encryption (FHE), privacy-preserving machine learning, lattice-based cryptography, and large-scale query or aggregation on encrypted data. Distinct packing strategies are optimized according to algebraic structure (RLWE, LWE, NTRU, Kyber), supported operations (SIMD, matrix-vector products, multi-recipient encapsulation), and application requirements (latency, bandwidth, noise budget, multiplicative depth).
1. Theoretical Foundations and Definitions
Ciphertext packing, as used in state-of-the-art cryptosystems, enables Single Instruction Multiple Data (SIMD) operations on encrypted data. In RLWE-based schemes such as CKKS and BFV, the polynomial structure admits packing plaintext elements into the slots of a degree- polynomial, allowing slot-wise homomorphic processing. In signature and KEM schemes such as Kyber, ghost layer packing permits multi-recipient encryption by vertically appending payloads.
Packing mechanisms must preserve correctness under ciphertext operations—specifically, they must (i) guarantee that packed slots or bit-fields remain isolated under addition and multiplication, (ii) ensure that the decryption does not introduce cross-slot (or cross-field) carry or overflow, and (iii) account for noise accumulation, quantization or modular reduction across packed layers.
Precise security reductions are provided, verifying that extended packed variants (e.g., P-Kyber) remain indistinguishable under chosen-ciphertext attacks, with IND-CCA security reducible to M-LWE (Liu et al., 24 Apr 2025).
2. Methodologies: Packing Layouts and Algorithms
2.1. Slot-based Packing in RLWE/CKKS/BFV
RLWE-based FHE platforms exploit the ability to encode up to elements in the slots of a ring-based polynomial. Key layout approaches include:
- Row-order, Column-order, Diagonal-packing (PackVFL): Optimal for large matrix–vector or matrix–matrix multiplication. Diagonal packing eliminates all costly ciphertext rotations (O3), substituting with hoisted rotations (O4) and plaintext hoisting (Yang et al., 1 May 2024).
- Block-oriented Packing (FastFHE): For convolution on tensors in CNNs, blocks subdivide slots into spatial super-squares, and each block contains all channels at a spatial location. Packing is expressed by slot index mapping:
for spatial blocks and channel (Song et al., 27 Nov 2025).
- Bit-interleaved Packing (FedBit): For integer-weight aggregation, bit-fields within each polynomial coefficient encode multiple weights, with the per-coefficient packing parameter determined by quantization width and carry margin:
2.2. Multi-layer and Lattice Packing
- Vertical (multi-layer) Packing (P-Kyber): Extends standard Kyber PKE to pack plaintexts in a single ciphertext. The secret expands from vector to matrix, and the public key and error polynomials similarly match this increased width (Liu et al., 24 Apr 2025).
- Cross-layer Lattice Packing: Message vectors in P-Kyber are further encoded via high-dimensional lattice codes (e.g., , Barnes–Wall, Leech lattice), allowing the embedding of an entire hypercube of payloads per ring-slot and attaining almost sphere-packing optimal DFR.
2.3. Bulk-Packed Storage for Preselection
- Template-aggregation Packing (PFIP): In privacy-preserving biometric databases, multiple feature vectors (templates) are packed per ciphertext bin. The enrollment and query phases leverage SIMD to compute many inner products in one operation, with a binary-tree of rotations to cumulatively sum slot contributions (Xin et al., 3 Jul 2025).
3. Scalability: Parallelism, Noise, and Overhead
Packing density is chiefly limited by algebraic degree, quantization, error/carry growth, and operational constraints such as SIMD slot count (), modulus width, or security constraints. Representative scaling schemes achieve substantial resource savings:
- FastFHE reduces ciphertext count and total rotations by up to -fold (number of channels) relative to SISO, yielding 4.3× speedup on ResNet20 convolution, and reduces kernel vector count by factors of 7.2–7.9 (Song et al., 27 Nov 2025).
- FedBit attains $60.7$– communication reduction by maximizing bit-packing subject to carry isolation, with packing capacity scaling as and slot count (Meng et al., 27 Sep 2025).
- PackVFL enables end-to-end speedups of up to compared to Paillier and up to over prior diagonal GALA packing. As batch size or matrix dimensionality increases, the advantage grows linearly in total problem size due to zero/constant ciphertext rotation costs (Yang et al., 1 May 2024).
- P-Kyber and its lattice-coded variant achieve reduction in CER and DFR at , enabling scalable multi-key delivery to numerous recipients in KEM/PKK contexts (Liu et al., 24 Apr 2025).
4. Algorithmic Workflows and Implementation Details
The core packing and unpacking procedures are shaped by the algebraic substrate:
| Scheme | Packing Algorithm Features | Unpacking/Decryption |
|---|---|---|
| FastFHE block packing | Block subdivision, channel-parallel slot mapping, minimal rotations | Slot-indexed reassembly by block/channel, channel-parallel decoding (Song et al., 27 Nov 2025) |
| FedBit bit-interleaved | Per-coefficient bitfield packing, enforce carry and modulus bounds | Bitmasking and shifting per slot after BFV decryption (Meng et al., 27 Sep 2025) |
| PackVFL diagonal | Input partitioning, diagonal slot distribution, lazy sum post decryption | Clear-sum of packed slots after decryption (Yang et al., 1 May 2024) |
| P-Kyber lattice | Vertical matrix packing for multiple payloads, optional cross-layer lattice encoding | Slot-level lattice decoding via CVP, entrywise parity compression (Liu et al., 24 Apr 2025) |
| PFIP preselection | Bin allocation index, padded/replicated feature vectors, SIMD-product/tree-sum | Single slot extraction and further candidate match (Xin et al., 3 Jul 2025) |
Pseudocode for packing/unpacking is provided in the referenced papers for typical tensor, array, or weight inputs. Correctness constraints—especially for schemes that use bitfield packing or for multi-user FHE/FV aggregation—are formalized as simple inequalities on the quantization size, carries, and plaintext modulus.
5. Application Domains and Empirical Benchmarks
Scalable ciphertext data-packing schemes have been incorporated into:
- FHE-based deep learning inference: FastFHE block-oriented packing in encrypted ResNet-20 achieves a 2.4× end-to-end speedup and reduces the ciphertext–plaintext accuracy gap to (Song et al., 27 Nov 2025).
- Privacy-preserving federated and vertical learning: PackVFL enables linear scaling to very large batch and feature dimensions. FedBit enables per-client end-to-end time reductions from $21.31$s to $9.26$s with an FPGA accelerator (Yang et al., 1 May 2024, Meng et al., 27 Sep 2025).
- Post-quantum key establishment: P-Kyber delivers up to $24$ independent keys per encapsulation, with sub-$5$ CER (Liu et al., 24 Apr 2025).
- Encrypted biometric search and retrieval: PFIP achieves retrieval of ciphertext templates in $0.33$–$0.66$s (30–50× speedup over exhaustive search), and scales retrieval time linearly in templates per ciphertext bin (Xin et al., 3 Jul 2025).
6. Trade-offs, Limitations, and Design Guidelines
Key trade-offs include:
- Noise and precision budget: Tight packing increases noise and may constrain multiplicative depth. Mitigations include immediate rescaling, compact polynomial approximations, or bootstrapping at fixed circuit depth (Song et al., 27 Nov 2025).
- Flexibility vs. Operation Support: FedBit’s bit-interleaved packing optimizes for summation but is less general for slotwise multiplication. CKKS slot-packing admits arbitrary vector ops but complicates fixed-point range (Meng et al., 27 Sep 2025).
- Packing granularity vs. key material: Increasing slot count or block dimension may require additional rotation or permutation keys. In practice, carefully selected , , or balances storage with throughput.
- Parameterization: All schemes require careful selection of modulus, block size, and encoding parameters to guarantee carry/overflow safety and to maximize performance for target workloads.
General design guidelines emphasize:
- Striving for zero ciphertext rotations in SIMD layouts unless absolutely necessary.
- Maximizing slot coverage per ciphertext.
- Deferring as much summation or merging as possible to the cleartext domain after trusted decryption.
- Partitioning data adaptively to the hardware and algebraic capabilities, e.g., input-packing for small matrices, block partitioning for large ones (Yang et al., 1 May 2024).
- Selecting block and lattice codes (for P-Kyber) with best sphere-packing properties in the chosen message dimension (Liu et al., 24 Apr 2025).
7. Extensions and Implications
Scalable packing is a fundamental enabler for privacy-preserving computation at scale, with wide applicability beyond canonical FHE vector operations. Notably:
- Cross-layer lattice coding in multi-user encryption unlocks both efficiency and strong error resilience.
- Matrix-oriented diagonal/slot packing provides a generic primitive for high-throughput secure ML.
- Bit-interleaved and hybrid block techniques offer direct translation to FPGA/ASIC acceleration as in FedBit (Meng et al., 27 Sep 2025).
- Any high-dimensional matching, search, or aggregation protocol that can be reduced to repeated SIMD operations benefits directly from these packing strategies, as illustrated by extensions to biometric identification, VFL/FL, and generic encrypted query frameworks (Xin et al., 3 Jul 2025, Yang et al., 1 May 2024).
Emerging schemes increasingly integrate hardware and parameter co-design with algorithmic packing, leading to the closure of throughput and scalability gaps in real-world, privacy-sensitive deployments.