Exploring Side-Channel Protections in Hardware Implementations of PQC ML-KEM Verification

Published 30 Jun 2026 in cs.CR and cs.AR | (2606.31681v1)

Abstract: As ML-KEM is adopted as a post-quantum cryptographic standard, resilience against physical side-channel attacks has become essential. Among the constituent steps, the decapsulation Fujisaki-Okamoto (FO) verification is particularly vulnerable to side-channel power and electromagnetic (EM) analysis. In this work, we focus on common FPGA-based implementations and examine their side-channel vulnerabilities, and compare them with those of microcontroller implementations. Three verification implementations, unprotected, hash-based (first-order), and higher-order masked, are evaluated for side-channel security on both a microcontroller and an FPGA. While FPGAs offer higher speed and parallelism, they often exhibit stronger side-channel leakage, especially in high bandwidth configurations. The higher-order masked designs still leak information about the underlying data due to hardware-level effects and data-dependent processing. Our experiments show that their parallelized processing on FPGAs introduces sufficient first-order leakage for full secret-key recovery. These results underscore the persistent challenge of securing PQC algorithms in performance-constrained and parallelized hardware environments.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper demonstrates that hardware parallelism in ML-KEM verification amplifies side-channel leakage, with metrics showing up to 100% classification accuracy at high data widths.
It compares unprotected, hash-based, and higher-order masked implementations on microcontrollers and FPGAs using high-resolution power and EM measurements.
The results challenge microcontroller-derived protections for hardware, urging the development of new leakage models and hardware-specific countermeasures.

Detailed Analysis of "Exploring Side-Channel Protections in Hardware Implementations of PQC ML-KEM Verification" (2606.31681)

Introduction and Motivation

The standardization of the CRYSTALS-Kyber-based Module-Lattice Key Encapsulation Mechanism (ML-KEM) as a cornerstone of post-quantum cryptography has put hardware security in the spotlight, especially as these protocols are deployed across a diverse device landscape. While ML-KEM’s algorithmic security draws from the inherent hardness of lattice problems, practical instantiations are threatened by side-channel attacks (SCAs) that extract secrets through observable physical effects—most notably in the Fujisaki-Okamoto (FO) verification step during decapsulation.

The FO step constitutes a binary comparison between a freshly re-encrypted ciphertext and the original input, serving as a privileged oracle for adversaries both in theoretical [hermelink_fault-enabled_2021] and practical contexts. Existing research has predominantly exposed and analyzed these vulnerabilities in software (microcontroller) settings. This work distinctly addresses the largely unexplored domain of FPGA-based ML-KEM verification, critically evaluating the interplay between protection countermeasures, performance, and leakage for various hardware and algorithmic choices.

Experimental Methodology

This study benchmarks three variants of FO verification—unprotected, hash-based (first-order), and higher-order masked—across both an ARM Cortex-M4 microcontroller and a Xilinx Spartan-6 SAKURA-G FPGA. The experimental platforms are explicitly chosen to mirror both NIST-recommended reference devices and typical cryptographic hardware acceleration targets. The authors utilize high-resolution power and electromagnetic (EM) measurements, leveraging advanced trace alignment, statistical analysis (T-tests, SNR), and machine learning classifiers to distinguish between decapsulation success/failure scenarios and inform key recovery capability.

Unprotected Verification: Parallelism–Security Trade-off

On both the ARM and FPGA, the unprotected implementation serves as a baseline for leakage and performance. For the FPGA, the degree of parallelism—i.e., the width of the data path over which ciphertext comparisons are made—was a controllable variable. The principal findings:

On microcontrollers (serial byte-wise comparison), power side-channel leakage produced an SNR of 2.40 and a classification accuracy of 94.8% (insufficient for practical key extraction with noisy traces using the belief-propagation-based solver).
For FPGAs:
- At low parallelism (32 bits), SNR and classification accuracy are suppressed (~0.95 SNR, 62.9% accuracy).
- As the comparison width increases (128-bit: 3.83 SNR, 99.1% accuracy; 512-bit: 8.33 SNR, 100% accuracy), leakage becomes readily exploitable, enabling full key extraction with <=8,000 traces.
- Figure 1: Processed traces for unprotected FPGA 128-bit (comparison width) implementation.

Figure 2: Power distributions of the two classes and SNR analysis of unprotected verification implementations on microcontroller and FPGA.

These data validate that hardware parallelism, sought for throughput, is antithetical to side-channel resistance. The principal mechanism: parallel comparison logic amplifies Hamming weight-based power leakage for failure cases (mismatched ciphertexts), whereas success cases (all-zero) remain low power. Additionally, increased parallelism reduces noise via averaging, further inflating SNR.

Hash-Based Comparison: Limitations of First-Order Protection

Transitioning to hash-based first-order masking (e.g., comparing SHAKE-128 cryptographic hashes, rather than raw ciphertexts), the approach is intended to obscure input-level distinctions. However, empirical evaluation reveals persistent first-order leakage:

Side-channel distinguishability remains high (94.7% classification accuracy) by exploiting differentials in the early rounds of Keccak, due to the deterministic propagation of state differences.
Lightweight shuffling (randomizing row processing order) on the FPGA hash core was easily reverse-engineered by trace fingerprinting, providing no meaningful entropy against side-channel analysis.
Figure 3: Distribution of absolute power differences power traces from FPGA hashed implementations.

Figure 4: Distribution of absolute power differences power values from FPGA hash-based attack for the selected leakage time point.

These observations underscore the ineffectiveness of simple hash-based masking and execution randomization for hardware targets—collision-style attacks and segment profiling trivially neutralize such defenses.

Figure 5: Zoomed-in view of power traces of FPGA hash implementation for two ciphertexts (A versus B) with randomized order of row processing.

Figure 6: Difference between power traces of Ciphertext A Last cycle versus Various Cycles for Ciphertext B in FPGA Hash Implementation.

Higher-Order Masked Comparison: Parallelism Erodes Assumptions

Higher-order masking (with multiple shares and GF arithmetic), shown to suppress first- and (in some cases) higher-order leakage under the ISW t-probing model [coron_high-order_2022] [bos_masking_2021], has been promoted for robust side-channel security. The FPGA experiments paint a different picture:

Parallelization across shares introduces correlated, data-dependent activity between combinational GF logic branches. Masking order is effectively reduced, permitting “first-order” exploitation in practice.
4-share and 6-share FPGA masked implementations yielded classification accuracies of 97.9% and 98.5%, respectively—surpassing even prior microcontroller results for supposedly t-probing secure code.
Figure 7: Filtered Power Traces for Higher Order Protected FPGA Implementation.

The critical implication is that independently secure software masking constructions do not port to hardware with spatial/temporal parallelism. The t-probing notion fails to capture real-world leakage channels—massively parallel hardware is not simply “t-probing secure at scale.”

Discussion: Practical and Theoretical Implications

The core result is a uniformly strong—and somewhat contrary—demonstration: parallelization in hardware systematically and dramatically degrades SCA resistance, often nullifying both first-order (hash-based) and higher-order (masked) protections that are robust in serial microcontroller contexts. The marked increase in SNR and classification accuracy with data path width and share count reveals that area/performance and security are in direct conflict, especially for temporal or spatial masking strategies not tailored for hardware leakage models.

No modern countermeasure evaluated (even at higher masking order) offered effective resistance on high-performance FPGA implementations. The lightweight randomization techniques and masking schemes, though attractive for their minimal overhead, prove insufficient for real-world cryptographic hardware with sizable parallel footprints.

Implications for ML-KEM/PQC Deployment and Research Directions

For deployment: ML-KEM verification architectures for production hardware (and other PQC algorithms with similar FO transform constructs) must not naively adopt microcontroller-derived countermeasures. Designers need to explicitly measure and verify SCA resistance under realistic hardware configurations, not rely solely on cryptographic reductionism or formal proofs in idealized models.
For theory: The results challenge the sufficiency of the t-probing security model and stress the need for robust leakage models that integrate parallelism effects—motivating further work on implementation-adequate masking and SCA-resilience proofs.
For future research: Efforts must prioritize new, hardware-specific masking strategies, integrate system-level mitigations (e.g., ephemeral key usage, verified key lifecycle management), or combine multiple protection paradigms with formal, empirical validation. Additionally, architectural investigation into on-chip shuffling, more aggressive circuit-level hiding, and possibly analog “blinding” holds promise.
For the standardization community: These findings inform the ongoing evolution of PQC hardware evaluation standards—current functional compliance and microcontroller-targeted SCA resistance claims are insufficient.

Conclusion

This work makes a clear, empirically validated claim: the current best-practice countermeasures for ML-KEM (and likely broader PQC) FO verification do not preclude key-recovery side-channel attacks when instantiated on high-throughput, parallelized FPGAs. Performance-oriented parallelism inherently amplifies exploitable leakage, reducing and sometimes fully negating the effectiveness of masking and hash-based protections that appear robust in software. This not only has immediate ramifications for the deployment and certification of cryptographic systems, but also calls for a reevaluation of hardware-rooted assumptions in the post-quantum security ecosystem. Practical PQC system design now must fully grapple with the hardware realities of leakage, and new countermeasure paradigms need to be developed to bridge the growing gap between cryptographic theory and engineering practice.

References

(2606.31681)
Additional cited references in paper: [hermelink_fault-enabled_2021], [hermelink_belief_2023], [danvers_higher-order_2022], [bos_masking_2021], [coron_high-order_2022], [hermelink_insecurity_nodate]