CKKS Scheme in Homomorphic Encryption
- CKKS scheme is a leveled, approximate homomorphic encryption protocol that enables SIMD-like operations on vectors of real or complex numbers while preserving privacy.
- It employs ciphertext rescaling and bootstrapping to manage noise growth during homomorphic additions and multiplications, balancing precision with computational efficiency.
- Performance enhancements via RNS, NTT/FFT optimizations, and hardware acceleration make CKKS suitable for large-scale secure machine learning and scientific computing applications.
The Cheon–Kim–Kim–Song (CKKS) scheme is a leveled, approximate homomorphic encryption (HE) protocol supporting SIMD-like, structure-preserving computation on vectors of complex or real numbers. It is based on the Ring Learning With Errors (RLWE) problem and specifically engineered for high-throughput, privacy-preserving analytics and machine learning on encrypted floating-point or fixed-point data. CKKS supports vectorized approximate addition, multiplication, and rotation, with ciphertext-level rescaling to control noise and precision. Its use now spans scientific computing, MLaaS, confidential control synthesis, and privacy-resilient outsourced inference, with ecosystem-scale deployments via OpenFHE, SEAL, TenSEAL, nGraph-HE2, and GPU-specialized libraries.
1. Algebraic Setting, Encoding, and Basic Algorithms
The CKKS scheme is constructed over the cyclotomic ring , where is a power of two. The plaintext space consists of packed complex entries embedded via an approximate canonical embedding; plaintexts are encoded as polynomials such that numerical vectors are mapped to by coefficient embedding and an inverse DFT, followed by multiplication with a global scaling factor and coefficient rounding. For a modulus (or RNS modulus chain ), the ciphertext ring is .
Key Generation and Encryption
- Secret key: is sampled from a small (ternary or discrete-Gaussian) error distribution.
- Public key: with uniform and sampled from the error distribution.
- Encryption: Given a scaled and rounded plaintext , sample new and , form ciphertext as:
or, in two-component form, .
- Decryption (with secret ): Compute and divide by , then decode to vector space.
Homomorphic Operations
- Addition: (noise increases additively).
- Multiplication: After component-wise product and expansion to , relinearization is performed via an evaluation key, and the resulting ciphertext is rescaled: coefficients and noise are each divided by (approximately) and modulus is reduced to .
- Rotation: Cyclic slot rotation is supported via Galois/evaluation keys; requires additional keyswitching.
- Rescaling: After multiplication, rescale divides scale by (or in RNS). This both reduces the modulus and precision, and controls the growth of ciphertext magnitude.
Each of these operations preserves the vectorized structure, enabling "SIMD-like" parallel encrypted computation on large batches of values (Kholod et al., 29 Oct 2024, Pathak, 2022).
2. Noise Growth, Security, and Parameterization
Security is based on the decisional RLWE problem for the chosen ring and modulus parameters. Key parameters include:
- Ring dimension : Security grows with , which must meet RLWE 128-bit threshold; also sets packing size.
- Modulus chain : Controls multiplicative depth . A longer chain allows more multiplications/levels but increases key and ciphertext size, and the error/noise budget must be managed relative to (Xu et al., 23 Nov 2025).
- Scaling factor : Governs fixed-point precision; larger improves relative error but consumes more modulus bits on each operation.
Noise behaves as follows:
- Homomorphic addition increases error additively; multiplication increases error multiplicatively (plus rescale truncation).
- After multiplications, noise grows approximately like , requiring parameter tuning so that after levels, final noise is much less than (Pathak, 2022).
- Rescaling cuts both signal and noise by (or equivalent).
Empirical studies show that, for tuned parameters, standard computations (matrix multiplications, ML inference) yield mean squared errors as low as to , with negligible impact on application-level accuracy (Khan et al., 2023).
3. Performance Engineering, SIMD Packing, and High-Throughput Implementations
CKKS naturally supports SIMD packing, encoding complex numbers into a single ciphertext. These slots are operated on simultaneously via slot-wise arithmetic, supporting ultra-high-throughput batch analytics:
- Specialized packing schemes are used for convolution, matrix-matrix, and vector-matrix operations, e.g., "im2col" for convolutional layers, diagonal and repeated/expanded formats for FHE-aware DNNs (Duc et al., 14 Jul 2025, Pirillo et al., 24 Jun 2025).
- Implementations exploit RNS and NTT/FFT representations for efficient polynomial arithmetic (Kholod et al., 29 Oct 2024, Agulló-Domingo et al., 7 Jul 2025).
- On CPUs, graph-level/engineered optimizations include batch-axis packing, constant-vs-vector encodings, scalar/plaintext kernel paths, and "lazy" rescaling. These allow nGraph-HE2, for instance, to achieve 1,998 images/s on CryptoNets (MNIST) (Boemer et al., 2019).
- Hardware acceleration on systolic arrays (e.g., Cornami’s FracTLcore) or GPUs (FIDESlib, OpenFHE+HEXL) yields up to – speedups for key operations, including bootstrapping and core HE primitives (Ovichinnikov et al., 15 Oct 2025, Agulló-Domingo et al., 7 Jul 2025).
- FIDESlib demonstrates end-to-end bootstrapping times of 73.5 ms for 64 slots (80 faster than AVX OpenFHE) and scales efficiently to 32k-slot settings.
Trade-offs in packing and parallelism must balance latency, ciphertext expansion, and the bottleneck on slot-wise multiplications and rotations. Highly parallel workloads can routinely reach close to GPU memory-bandwidth limits (Agulló-Domingo et al., 7 Jul 2025, Ovichinnikov et al., 15 Oct 2025).
4. Approximate Bootstrapping and Deep Circuits
To support circuits deeper than the modulus chain allows, CKKS uses approximate bootstrapping. The canonical Cheon–Kim–Kim–Song bootstrapping protocol proceeds as:
- CoeffToSlot (C2S): Homomorphic DFT transforms coefficient encoding to slot encoding.
- Approximate modulus reduction: Homomorphically evaluate the modular reduction (sawtooth or sine/cosine function) in each slot, using Chebyshev polynomial approximation and rescale after each step.
- SlotToCoeff (S2C): Inverse DFT to return to coefficient form.
- Key-switch and modulus refresh: Output is a ciphertext at refreshed modulus and reestablished noise budget, but with the approximate nature of the scheme preserved (i.e., noise floor cannot be reduced below initial encoding error).
Typical bootstrapping depth is 16–25 levels, constrained by polynomial approximation of modular reduction; application pipelines (e.g., fully encrypted deep-learning training in ReBoot (Pirillo et al., 24 Jun 2025)) interleave bootstrapping with blocks to manage cumulative noise and maintain precision. ReBoot demonstrates that this approach allows fully encrypted training of deep MLPs with competitive accuracy for image recognition and tabular ML, reducing training latency by up to 8.83 over prior frameworks.
5. Fault Tolerance, Error Sources, and Reliability Engineering
CKKS’s approximate design yields native resilience to low-order bit-flips in plaintexts, but performance-oriented variants utilizing RNS and NTT are highly sensitive to even a single coefficient-level or NTT residue error:
- In "vanilla" CKKS (HEAAN, OpenFHE without RNS), LSBit flips in plaintexts yield -error proportional to ; higher-order flips can catastrophically corrupt decrypted output (Mazzanti et al., 28 Jul 2025).
- Enabling RNS or NTT optimizations amplifies error magnitude, causing decryption failure even for LSB flips; error in a single NTT residue reconstructs as a high-magnitude error in the full plaintext.
- Practical error-resilient design advises large scaling factors (to move error explosion threshold), sparsified slot allocation (for partial immunity), and, for ultra-reliable workloads, custom error-detecting codes at the plaintext or ciphertext level (Mazzanti et al., 28 Jul 2025).
- Binary CKKS variants introduce BCH codes and operate over binary polynomial rings, achieving deterministic, bit-exact decryption with negligible failure probabilities, and replace rescaling with explicit bootstrapping (Refresh), simplifying complexity at modest speed cost for multiplications (Chen et al., 4 Aug 2025).
6. Parameter Configuration Automation and Application-Specific Tuning
CKKS parameter selection (ring dimension , modulus chain , scaling factor , and packing layout) is a high-dimensional, tightly coupled optimization:
- Larger improves security but increases operation cost; deep chains support larger circuits but further increase computational burden (Xu et al., 23 Nov 2025).
- The FHE-Agent framework employs an LLM-guided, multi-fidelity search process that combines static analysis, simulated profiling, and encrypted benchmarking to tune and validate configurations for given ML workloads. It automatically achieves precision and latency trade-offs unattainable with prior heuristics and is able to recover 128-bit RLWE-secure CKKS setups for complex models (e.g., AlexNet), where baseline compilers fail (Xu et al., 23 Nov 2025).
Typical application-tuned parameters include or , chain length –$8$, modulus primes of 45–60 bits, and –, as dictated by circuit depth and accuracy targets.
7. Research Applications, Limitations, and Future Directions
CKKS has been deployed for:
- Secure cloud-based control synthesis (Model-based RL, value iteration, SARSA, Z-learning) with provable bounds on noise-induced error in limit points (Suh et al., 2021).
- Large-scale privacy-preserving machine learning (encrypted inference and training, face detection on UAV images, neural network training with bootstrapping), with less than 1% accuracy loss versus plaintext (Pirillo et al., 24 Jun 2025, Duc et al., 14 Jul 2025).
- Fast encrypted scientific computation (finite-difference PDEs, matrix multiplications) with negligible MSE under tuned settings (Kholod et al., 29 Oct 2024, Khan et al., 2023).
- Efficient privacy-protecting ranking, order statistics, and SIMD-parallel sorting, exploiting slot-wise comparison logic (Mazzone et al., 19 Dec 2024).
Outstanding limitations include:
- Bootstrapping cost, despite being reduced by GPU/FPGA acceleration and blockwise circuit design, remains a key bottleneck.
- All applications depend critically on precise error budgeting (balancing scale, chain, and packed slots) and automated configuration is an ongoing field of research.
- Division and non-polynomial non-linearities require careful function-fitting as CKKS natively only supports addition and multiplication.
- Resistance to hardware faults (e.g., bitflips in RNS/NTT) is not guaranteed without redundancy.
Continued developments focus on fault tolerance, low-overhead bootstrapping, multi-GPU scaling, and integration of hybrid HE-MPC and circuit privacy protocols.
For a detailed technical treatment and implementation guidance, see (Kholod et al., 29 Oct 2024, Agulló-Domingo et al., 7 Jul 2025, Pathak, 2022, Xu et al., 23 Nov 2025, Pirillo et al., 24 Jun 2025, Mazzanti et al., 28 Jul 2025, Mazzone et al., 19 Dec 2024, Khan et al., 2023, Suh et al., 2021, Duc et al., 14 Jul 2025, Boemer et al., 2019, Chen et al., 4 Aug 2025), and (Ovichinnikov et al., 15 Oct 2025).