Partial Number Theoretic Transform Masking in Post Quantum Cryptography Hardware: A Security Margin Analysis

Published 4 Apr 2026 in cs.CR | (2604.03813v1)

Abstract: Adams Bridge, a hardware accelerator for ML-DSA and ML-KEM designed for the Caliptra root of trust, masks 1 of its Inverse Number Theoretic Transform (INTT) layers and relies on shuffling for the remainder, claiming per-butterfly Correlation Power Analysis (CPA) complexities of 2⁴⁶ (ML-DSA) and 2⁹⁶ (ML-KEM). We evaluate these claims against published side-channel literature across seven analysis tracks with confidence-rated evidence. Register-Transfer Level (RTL) analysis confirms that the design's Random Start Index (RSI) shuffling provides 6 bits of entropy per layer (64 orderings) rather than the 296 bits of a full random permutation assumed in its scaling argument, with effective margins below the designers' estimates. A soft-analytical attack pipeline demonstrates a 37-bit enumeration reduction, independent of Belief Propagation (BP) gains, quantifying the attack-model gap without achieving key recovery. Full-scale BP on the complete INTT factor graph achieves 100% coefficient recovery over the single-layer baseline, resolving whether BP gains scale to production-size Number Theoretic Transform (NTT) structures. A genie-aided information-theoretic bound shows observations contain sufficient mutual information for full recovery at SNRxN as low as 15. Layer-ablation analysis identifies four necessary conditions governing BP convergence. Observation topology, not count, determines recovery: 4 evenly spread layers achieve 100% while 4 consecutive layers achieve 0%, yielding a practical countermeasure design tool. Strategic masking of 3 consecutive mid-layers (43% overhead vs. full masking) creates an unrecoverable gap that defeats soft-analytical attacks. We contribute a reusable security margin audit methodology combining RTL verification, epistemic confidence tagging, sensitivity-scenario analysis, and experimental validation applicable to any partially masked NTT accelerator.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that partial masking with limited RSI shuffling leads to security margins far below the claimed levels under BP-enabled SASCA attacks.
It uses a combined approach of empirical TVLA measurements and algebraic factor graph analysis to expose vulnerabilities in INTT masking schemes.
The study recommends enhancing masking strategies—such as extending DOM masking or using full random permutations—to achieve robust security in PQC hardware.

Security Margin Analysis of Partial NTT Masking in Post-Quantum Cryptography Hardware

Introduction

This paper delivers a rigorous evaluation of partial masking strategies in post-quantum cryptography (PQC) hardware, focusing on the Adams Bridge accelerator for ML-DSA and ML-KEM designed for Caliptra roots of trust. The security claims of Adams Bridge, particularly its reliance on shuffling-based countermeasures and selective INTT masking to defend against side-channel attacks (SCA), are scrutinized using both literature synthesis and empirical validation pipelines. The analysis directly questions whether per-butterfly CPA complexity and claimed shuffling entropy yield security margins robust to algebraic and analytical side-channel cryptanalysis techniques, especially Soft Analytical Side-Channel Attacks (SASCA) and belief propagation (BP).

Background: Masking, Shuffling, and Algebraic Attacks in NTT Hardware

The Number Theoretic Transform (NTT), central to lattice-based PQC algorithms such as ML-KEM (Kyber) and ML-DSA (Dilithium), is a well-known vector for power and electromagnetic side-channel leakage. While full-domain-oriented masking (DOM) yields robust dth-order security, it incurs significant silicon overhead. Partial masking, as implemented in Adams Bridge, limits the scope of Boolean masking to select INTT layers, employing shuffling for the majority of butterfly layers to manage area and complexity. The Random Start Index (RSI) shuffling in question provides only 6 bits of entropy per layer (64 orderings), in contrast to the theoretical 2296 bits for full random permutations, a critical difference for security amplification.

Prevailing SCA countermeasures for NTT-based PQC hardware have shown that shuffling and masking can provide multiplicative security amplification under strict assumptions; yet, past research on shuffling for AES and NTT implementations demonstrates that RSI variants are highly susceptible to enumeration once position-dependent biases and limited entropy are considered. Literature on algebraic attacks has established that classical CPA bounds are inapplicable under SASCA and BP, which exploit entire butterfly factor graphs and bypass per-coefficient brute-force search, instead using structure-induced reductions in the actual hypothesis space.

Summary of Adams Bridge Architecture and Security Claims

Adams Bridge masks only the first INTT layer in hardware with DOM-style first-order Boolean masking, reverting to unmasked intermediates for the remaining layers, and applies a dual-level RSI with only 64 orderings per layer. Security claims include CPA complexity of 246 (ML-DSA) and 296 (ML-KEM) per butterfly, supposedly compounded via shuffling across layers. This would ostensibly result in security margins of ~288 (ML-DSA) and ~2132 (ML-KEM). Aggregated TVLA results up to $10^6$ traces are presented as further empirical defense, with asserted area savings due to partial masking.

Security Margin Evaluation: Literature Grounding and Empirical Pipeline

Structural Weakness in RSI Shuffling

The RSI shuffling implementation is confirmed at the RTL and provides 6 bits of entropy per layer, a drastic reduction from full random permutation's factorial entropy. Published cautionary analyses show that RSI is often trivial to enumerate, especially under profiling or template attacks; entropy does not compound multiplicatively across layers in practice, refuting the claim that effective complexity scales as $S^L$ . Additionally, BP can efficiently marginalize the shuffling order across layers, reducing effective attacker work to $S \times L$ BP runs.

Algebraic Factor Graph Reductions

The Gentleman-Sande butterfly algebraic structure, specifically the deterministic relation between outputs, imposes constraints that reduce the effective enumeration space below CPA bounds. For ML-DSA with $q = 8,380,417$ , recovered key material via SASCA and modular reduction attacks often suffice for full key recovery given only partial leakage, especially as lattice reduction (e.g., BKZ-60) requires significantly fewer coefficients than the total processed, dependent on SNR and error distribution.

SASCA, BP, and the Effectiveness of Partial Masking

Composite margin synthesis—integrating algebraic, shuffling, and lattice reductions—places realistic margin bounds for partial masking far below the designers' claims. Pro-defender assumptions yield 259-263 bits (ML-DSA) and 261-265 bits (ML-KEM), while pro-attacker scenarios reduce these to 215-227 and 216-230, respectively—gaps of several orders of magnitude from the putative ~288 and ~2132 bits. The most compelling empirical evidence is the attack-model gap: under profiled, BP-enabled SASCA, the complexity falls by 37 bits (from 246 to 29) strictly via RSI enumeration, regardless of BP gains. Full BP on the ML-KEM factor graph (7 layers, $q = 3,329$ ) achieves $100\%$ recovery at SNR $\times$ N = $3,000$, with MI amplification factors up to 2.24 $\times$ over single-layer observation, confirming—at least for low $q$ —the attack efficacy.

Temporal Exposure and Countermeasure Hierarchy

Only the first INTT layer is masked; 85–87.5% of INTT clock cycles yield unmasked, SCA-exploitable leakage. Layer-ablation analysis shows that observation topology rather than raw observation count governs BP attack success. Crucially, masking the input layer (L1) alone produces a topological information barrier; masking three consecutive mid-layers creates a practical recovery gap at only 43% of the area cost of full masking. TVLA passes at $S^L$ 0 traces only establish the absence of first-order detectable leakage given current noise, not the absence of exploitable leakage under profiling, template attacks, or BP-based SASCA.

Recommendations for PQC Hardware Security Margins

RSI shuffling is insufficient as a sole countermeasure; full random permutation or extension of DOM-style masking to additional layers is essential for robust security.
Security margin analysis must be performed against attack models matching the current cryptanalytic state of the art (i.e., SASCA/BP), not legacy CPA analysis.
Empirical validation should focus on measured SCA attack feasibility in hardware. Hardware (FPGA/ASIC) experiments remain the gold standard for determining actual leakage exploitation margins.
Temporal TVLA should restrict analysis to unmasked computation phases and utilize increased trace counts for higher sensitivity, in line with contemporary SCA leakage assessment standards.
Strategic gap masking—targeting three consecutive mid-layers—offers an area-efficient, BP-validated defense that is practically sufficient at moderate trace acquisition budgets, but full L1 masking or complete masking is necessary for high-assurance deployments (e.g., certification evaluations for FIPS 140-3).

Implications and Future Directions

The practical implication is that for PQC hardware intended as roots of trust, partial masking + shuffling designs must justify their margins against SASCA attack models exploiting BP and algebraic interactions. The findings incentivize further development of automated, machine-verifiable algebraic analysis for masking verification, but also highlight the urgent need for published BP/SASCA results in real post-quantum hardware to calibrate the literature-derived margins.

Future work should:

Extend BP-enabled SASCA validation to ML-DSA ( $S^L$ 1) using approximate or neural BP, as exact inference is computationally infeasible.
Characterize the effect of practical noise and imperfect leakage models (Hamming weight, misalignment, inter-coefficient coupling) in hardware measurements.
Systematize strategic masking coverage, e.g., by developing automated RTL-to-security-margin pipelines capable of identifying topological vulnerabilities.

Conclusion

The margin gap between design claims and empirically/analytically justified margins for partial NTT masking in PQC hardware is quantitatively significant and structurally robust—persisting across conservative and attacker-favorable assumptions. The proposed evidence-rated audit methodology, empirical SASCA pipeline, and machine-verified algebraic backbone deliver actionable guidance for certification and hardware security evaluation. While no practical key recovery attack is demonstrated on Adams Bridge itself, the presented analysis establishes that its current security claims require substantial qualification in light of modern SASCA/BP attack paradigms. Strategic gap masking and extension of DOM-style masking, combined with thorough empirical SASCA evaluation, are essential to secure hardware NTT implementations deployed in root-of-trust environments.

Reference:

"Partial Number Theoretic Transform Masking in Post Quantum Cryptography Hardware: A Security Margin Analysis" (2604.03813)

Markdown Report Issue