Crypto-Near-Cache (CNC) Architecture

Updated 4 October 2025

Crypto-Near-Cache (CNC) is an architectural paradigm that integrates compute capabilities within cache slices to accelerate cryptographic workloads, especially for post-quantum algorithms.
It employs bit-parallel processing and specialized ISA extensions to minimize data movement and energy overhead, optimizing operations like NTT and modular multiplication.
CNC design emphasizes hardware-software co-design with rigorous leakage analysis and countermeasures to mitigate cache-based side-channel vulnerabilities.

Crypto-Near-Cache (CNC) refers to an architectural and methodological paradigm that integrates compute capabilities directly with or in the close physical proximity of cache structures to accelerate cryptographic workloads, with particular emphasis on post-quantum cryptographic algorithms. CNC frameworks are engineered to address the mounting bottleneck of cache bandwidth and data movement that arises as cryptographic kernels and key sizes increase in the post-quantum context. At the architectural level, CNC instantiates near-cache-slice computing, deploying compute-enabled SRAM arrays with bitline processing within or near each cache slice, exposed to the processor via customized ISA extensions. Simultaneously, CNC must address the heightened risk of microarchitectural side channels, especially cache-based information leakage, due to the close coupling of computation and data storage. This dual agenda motivates a comprehensive treatment encompassing hardware design, software tooling, static and dynamic leakage analysis, and countermeasure integration.

1. Architectural Design and Integration of CNC

CNC leverages a near-cache-slice computing model in which compute-enabled SRAM arrays are placed adjacent to each cache slice. Each slice integrates:

A bitline-capable SRAM array supporting bit-parallel arithmetic/logic operations,
A command array for algorithm-specific control (such as precomputed NTT instructions), and
A control module managing read, write, and computation, incorporating logic for virtual-to-physical address translation and cache line alignment (Zhang et al., 27 Sep 2025).

These CNC compute arrays are typically partitioned into “Computing Blocks” (CBs), dynamically reconfigurable to accommodate variable operand widths (e.g., variable precision for lattice-based cryptography). The array organization formula $p = \lfloor 512/m \rfloor$ (where $m$ is the column count per CB) exemplifies resource partitioning for compute granularity.

To expose CNC capabilities to software layers, ISA extensions (e.g., SW_CNC for specialized store, RD_D2CNC for direct block read, LD_CMD for command loading, ALG_CNC for compute invocation) are introduced. These instructions operate on virtual addresses, with physical placement managed by a destination hash module and TLB, enabling seamless software integration with minimal datapath modification.

2. Performance and Efficiency Benefits

CNC minimizes data movement by collocating compute and cache, addressing the critical limitation of cache bandwidth identified in analysis of post-quantum cryptographic workloads (Zhang et al., 27 Sep 2025). Key performance gains include:

High internal bandwidth: Entire cache lines are transferred and processed in-place in a few cycles, bypassing bottlenecks of conventional cache–core round-trip data paths.
Bit-parallelism: SRAM bitlines support concurrent processing of multiple bits/coefficients, ideally suited for vectorized cryptographic operations such as number theoretic transforms (NTT) and Montgomery multiplication used in lattice-based schemes.
Near-zero-cost shift operations: Combining bit-parallel logic and optimized shifting reduces the critical path for common cryptographic kernels.
Energy efficiency: Compute-in-memory eliminates the energy overhead of cache–core shuttling, reduces the need for frequent DRAM traffic, and leverages parallelism for lower per-operation energy.

This approach directly addresses the performance and energy efficiency overheads implicated by post-quantum cryptography, where public keys and signatures are 3–9× longer than conventional primitives.

3. Implementation Details and Customization

ISA extensions play a central role in making CNC accessible at the software/compiler interface. Instructions operate on virtual addresses, with the CNC controller ensuring correct alignment and physical mapping for bitline operations (Zhang et al., 27 Sep 2025). The processor pipeline is modified at the decode, execute, and memory stages:

Decode triggers CNC-specific control logic,
Execute resolves destination hash and control commands,
Memory stage bypasses standard L1 accesses, routing data directly to the computing array.

Fine-grained control logic, multiplexers, and sense amplifiers enable programmable AND/OR/XOR, shift, and block movement. The command array and computing blocks are parametrized for runtime reconfiguration, enabling support for a range of cryptographic algorithm operand widths.

4. Cryptographic Workloads and Acceleration

CNC is targeted at workloads requiring heavy modular arithmetic, vector operations, and NTTs—core building blocks for both classical and post-quantum cryptosystems. For instance:

Lattice-based primitives (e.g., Crystals-Kyber, Crystals-Dilithium, Falcon): CNC maps NTTs ( $O(n\log n)$ complexity) to concurrent SRAM bitline operations, reducing cycle counts.
Modular multiplication: CNC employs hardware optimizations such as bit-parallel execution and efficient carry management, which are critical for wide operand computations in PQC.
Vector-based cryptographic kernels (e.g., hash-based signatures): By partitioning data across CBs, the architecture permits scalable parallelization, irrespective of operand size or key width.

The native support for variable data width and large operand sizes accommodates the expanded key and signature sizes in post-quantum cryptography.

5. Security: Side-Channel Vulnerabilities and Countermeasures

The proximity of compute to cache, and the increased cache utilization, amplifies the potential for cache-based side-channel leakage. Practical and theoretical results across multiple analyses demonstrate:

CNC’s architectural optimizations can magnify any secret-dependent memory behavior, increasing the risk and observability of side-channel leakage if not properly mitigated (Irazoqui et al., 2017).
Attacks using forced eviction/timing and mutual information (MI) analysis are directly applicable in CNC contexts: repeated cache traces can be correlated with secret material, with even small variations in access patterns leading to key recovery.
Security analysis tools (e.g., dynamic taint analysis with MI, abstract interpretation with Secret-Augmented Symbolic domains, refinement type systems—see (Irazoqui et al., 2017, Wang et al., 2019, Jiang et al., 2022)) are crucial for proactively detecting and quantifying leakages specific to cryptographic code running within CNC frameworks.
The necessity of constant-time coding, table alignment, scatter/gather access, and formal verification of implementation-level independence between secrets and cache-state emerges as a critical feature both in classic and CNC-enabled architectures.

Measures such as integrating automated leakage detection in the CNC development pipeline, using S-box/vectorized or masked implementations, and enforcing algorithm-independent cache line alignment are explicitly recommended (Irazoqui et al., 2017).

6. Challenges and Innovations in Integration

The introduction of compute capability within/near cache slices introduces several technical challenges (Zhang et al., 27 Sep 2025):

Data alignment and placement: The unpredictability caused by virtual addressing and cache hashing can impair reliable bitline operation. The CNC solution employs physical address translation post-TLB and custom control logic to guarantee correct alignment and mapping.
External bandwidth limitation: Traditional near-cache compute designs are bottlenecked by narrow external bandwidth to the cache; CNC sidesteps this by localizing computation and minimizing off-chip transfers.
Coherence and regular cache operations: Compute and cache read/write must be properly synchronized so as not to disrupt standard memory hierarchy semantics.

By equipping each slice with autonomous compute/controller modules and locally stored control/command sequences, CNC maintains independent, parallelizable compute streams with minimal crossbar or on-chip network contention.

7. Future Directions and Practical Implications

CNC reifies a trend toward hardware–software codesign for cryptographic computation, suggesting the following forward-looking themes:

Coupling of hardware specialization for PQC with robust, tool-supported leakage detection and prevention methods. Practical integration of tools implementing dynamic taint analysis, abstract interpretation with symbolic domains, and refinement type systems into the CNC software flow is implied to be essential for robust security.
Systematic evaluation of CNC not just for performance/energy, but for resilience against microarchitectural side channels via empirical (e.g., entropy, KL-divergence, occupancy/eviction complexity (Genkin et al., 2022)) and formal metrics.
Hardware-in-the-loop verification: As CNC frameworks evolve, compiler and ISA support for abstract view analysis and type inference may enable automated proof-carrying code or static partitioning of "safe" and "unsafe" execution domains.
Addressing challenges associated with blinding, randomized masking, and effective layout in the context of compute-enabled cache, given that naive blinding strategies can introduce new side-channel vectors observable by CNC-augmented measurement techniques (Jiang et al., 2022).

A plausible implication is that future CNC architectures will be increasingly evaluated not only on raw throughput but on comprehensive metrics synthesizing performance, energy, and provable leakage upper bounds—driven by integrated static/dynamic analysis tools.

In summary, Crypto-Near-Cache frameworks integrate programmable, bit-parallel, compute-enabled SRAM arrays directly with cache slices, optimizing for high-throughput and low-energy cryptographic computations, particularly suited to post-quantum algorithms. CNC exposes its capabilities via ISA extensions and mandates careful architectural, implementation, and verification strategies to counter side-channel vulnerabilities that are exacerbated by this coupling. The integration of sophisticated static and dynamic leakage detection tools, combined with constant-time and masked software implementations and flexible, cache-localized hardware, is essential for the secure adoption of CNC in modern cryptographic systems (Zhang et al., 27 Sep 2025, Irazoqui et al., 2017, Jiang et al., 2022, Wang et al., 2019, Genkin et al., 2022).