Hardware Masking: Techniques & Security

Updated 4 July 2026

Hardware masking is a countermeasure that splits sensitive data into randomized shares to prevent side-channel leakage.
It combines Boolean and arithmetic masking with specialized gadgets and glitch-aware implementations for robust hardware security.
Verification and synthesis tools ensure that masked circuits meet strict leakage models while balancing performance and security.

Hardware masking is a circuit-level countermeasure against side-channel leakage in which a sensitive value is replaced by randomized shares and the hardware manipulates those shares rather than the unshared secret. In the Boolean setting, a shared value is written as $a = a_1 \oplus a_2 \oplus \cdots \oplus a_n$ ; in arithmetic masking for post-quantum cryptographic datapaths, a secret is shared as $s = (s_0 + s_1) \bmod q$ or, equivalently, $S_0 = (X - S_1)\bmod q$ (Covic et al., 2021, Iskander et al., 16 Apr 2026). The topic spans attack models, gadget constructions, glitch-aware implementation rules, compositional security notions, high-level synthesis constraints, and formal verification. In contemporary work, it also extends beyond classical cryptography to masked processors and masked neural-network inference engines (Srivastava et al., 23 Feb 2026, Dubey et al., 2020).

1. Foundations and leakage models

The classical purpose of masking is to ensure that individual observations reveal no information about the underlying secret. In the survey taxonomy of circuit masking, the field is organized around attack models, gadgets, masking schemes, implementations, verification, and standardization (Covic et al., 2021). The classical $t$ -probing model assumes that an adversary can place up to $t$ perfect, noiseless probes on wires; security requires that any $t$ probed values are independent of the secret. This model underlies a large fraction of masking theory, but hardware-oriented work repeatedly stresses that it ignores glitches, transitions, coupling, and routing effects (Covic et al., 2021).

Boolean sharing is the standard starting point. For a shared value,

$a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$

Arithmetic sharing is natural for lattice-PQC datapaths over $\mathbb{Z}_q$ , where coefficients are shared as

$s = (s_0 + s_1) \bmod q.$

Recent verification work on masked NTT hardware makes the first-order objective explicit: a circuit is first-order probing secure if an adversary observing any single wire learns nothing about the secret $s$ (Iskander et al., 16 Apr 2026). In that setting, a wire is modeled as

$s = (s_0 + s_1) \bmod q$ 0

where $s = (s_0 + s_1) \bmod q$ 1 denotes fresh randomness and $s = (s_0 + s_1) \bmod q$ 2 public inputs (Iskander et al., 16 Apr 2026).

The modern hardware literature distinguishes several stronger notions than plain probing security. The survey identifies NI, SNI, and PINI as compositional notions, and also discusses robust probing, glitch-extended probing, and $s = (s_0 + s_1) \bmod q$ 3-glitch immunity for hardware settings where transient values matter (Covic et al., 2021). This distinction is central: a gadget may satisfy a local probing-style definition while failing after composition in real hardware because of glitches or recombination effects. A plausible implication is that hardware masking is defined as much by its leakage model as by its sharing algebra.

2. Gadget constructions and glitch-aware implementation

The core engineering difficulty in masked circuits is nonlinearity. In Boolean masking, XOR is linear and can be applied sharewise, whereas AND or multiplication must be replaced by secure masked gadgets. The classical ISW-style nonlinear construction uses random values $s = (s_0 + s_1) \bmod q$ 4 and computes output shares

$s = (s_0 + s_1) \bmod q$ 5

with

$s = (s_0 + s_1) \bmod q$ 6

for shared inputs $s = (s_0 + s_1) \bmod q$ 7 and $s = (s_0 + s_1) \bmod q$ 8 (Covic et al., 2021). The survey also highlights Threshold Implementations, which rely on correctness, uniformity, and non-completeness, as well as CMS, DOM, UMA, and GLM as major scheme families (Covic et al., 2021).

For glitch-robust hardware, the placement of registers is security-critical. “MaskedHLS” treats masked hardware generation as a security-preserving translation problem rather than an ordinary optimization problem (Sarma et al., 2024). The paper shows why conventional HLS is misaligned with masking invariants: compiler reassociation, expression balancing, scheduling, and resource sharing preserve Boolean functionality but can destroy the intended masked structure. A representative DOMAND expression is

$s = (s_0 + s_1) \bmod q$ 9

and front-end rewriting can move XORs across the wrong intermediate terms, yielding RTL that is logically correct but no longer masked (Sarma et al., 2024).

The same work formalizes four state-of-the-art masked gadget families for hardware generation: DOM, HPC1, HPC2, and COMAR (Sarma et al., 2024). Their hardware versions differ from software not mainly in algebra, but in the requirement that registers appear at specific points to stop glitch propagation and that parallel paths be balanced. For instance, HPC2 is described with

$S_0 = (X - S_1)\bmod q$ 0

with registers at all input shares and four intermediate locations (Sarma et al., 2024). This suggests that “masked hardware” is not only a matter of share algebra; it is also a retiming and path-balancing discipline.

3. Arithmetic masking and composability in PQC pipelines

Arithmetic masking has become central in PQC accelerators because NTT, INTT, butterfly units, and modular reductions are naturally expressed over $S_0 = (X - S_1)\bmod q$ 1. In recent formal work on masked NTT pipelines, a Cooley–Tukey butterfly with inputs $S_0 = (X - S_1)\bmod q$ 2 computes

$S_0 = (X - S_1)\bmod q$ 3

and with a fresh output mask $S_0 = (X - S_1)\bmod q$ 4 the observed wires become

$S_0 = (X - S_1)\bmod q$ 5

The key theorem is that each output wire has exactly one mask value producing each output value; equivalently, each wire is uniform over the fresh mask and independent of the secrets under the adopted first-order probing semantics (Iskander et al., 22 Apr 2026).

That result matters because pointwise value-independence is false for butterfly outputs. If $S_0 = (X - S_1)\bmod q$ 6 is held fixed, $S_0 = (X - S_1)\bmod q$ 7 changes with $S_0 = (X - S_1)\bmod q$ 8 or $S_0 = (X - S_1)\bmod q$ 9; the correct local invariant is therefore not pointwise constancy but per-context uniformity over the fresh mask (Iskander et al., 22 Apr 2026). The same paper proves that a $t$ 0-stage NTT pipeline with fresh per-stage masking satisfies per-context uniformity at every stage under the ISW first-order probing model (Iskander et al., 22 Apr 2026). A plausible implication is that fresh-mask renewal is the arithmetic analogue of a compositional boundary condition.

A complementary line of work formalizes this idea through PF-PINI for prime fields. A PF-PINI gadget $t$ 1 satisfies

$t$ 2

so $t$ 3 is a maximum multiplicity parameter for single-wire probing over $t$ 4 (Iskander et al., 28 Apr 2026). The central renewal theorem states that for any $t$ 5,

$t$ 6

which means that subtracting a fresh uniform mask makes the inter-stage wire perfectly uniform regardless of Stage 1’s PF-PINI parameter (Iskander et al., 28 Apr 2026). Consequently, if $t$ 7 is PF-PINI( $t$ 8) and $t$ 9 is PF-PINI( $t$ 0), then the composed two-stage pipeline with fresh masking satisfies PF-PINI( $t$ 1) (Iskander et al., 28 Apr 2026).

This quantitative arithmetic theory is tightly connected to Barrett reduction. The same paper proves a hardware-faithful equivalence between an algebraic Barrett internal map and its natural-number implementation, and then shows that the hardware form is PF-PINI(2) (Iskander et al., 28 Apr 2026). In parallel, a universal Lean proof for arithmetic masking verification establishes that for every $t$ 2, every wire function, and every pair of secrets, value-independence implies identical marginal distributions over $t$ 3 (Iskander et al., 20 Apr 2026). The hardware significance is that arithmetic masking verification for ML-KEM and ML-DSA no longer depends on modulus-specific finite enumeration; the ring $t$ 4 becomes the abstraction layer.

4. Synthesis flows, controller datapaths, and processor-level masking

The synthesis of masked hardware is itself a security problem. “MaskedHLS” starts from masked C/C++ with annotations of the form $t$ 5, constructs a retiming problem, and inserts both required security registers and balancing registers with minimum latency (Sarma et al., 2024). Its retiming model uses a directed graph $t$ 6 with edge register counts $t$ 7, delays $t$ 8, and retiming labels $t$ 9; after retiming,

$t$ 0

On PRESENT and AES S-box benchmarks masked with DOM, HPC1, HPC2, and COMAR, the generated RTL shows on average $t$ 1 fewer registers and $t$ 2 less latency than manual balancing, while TVLA results are substantially stronger than Vivado HLS-generated variants (Sarma et al., 2024).

Verification of HLS-generated masked RTL requires special treatment because HLS often produces controller-datapath architectures with resource-shared datapaths. “MaskedHLSVerif” addresses this by state-wise formal verification of controller-datapath RTL obtained via HLS, thereby avoiding false positives caused by resource-shared datapaths (Sarma et al., 19 Mar 2026). The toolflow correctly verifies standard cryptographic benchmarks and the PRESENT S-box masked with gadgets, where REBECCA reports false positives, and it can also detect masking flaws induced by HLS optimizations (Sarma et al., 19 Mar 2026). This suggests that hardware masking after HLS cannot be reduced to a purely combinational check on the flattened netlist.

A distinct architectural direction appears in processor-level masking. “CryptRISC” extends the CVA6 core with 64-bit scalar cryptography instructions and inserts a Field Detection Layer and a Masking Control Unit inside the pipeline (Srivastava et al., 23 Feb 2026). Its masking engine uses the unified affine form

$t$ 3

specialized to Boolean masking over $t$ 4, arithmetic masking over $t$ 5, and affine or multiplicative masking over $t$ 6 depending on the instruction’s dominant algebraic field (Srivastava et al., 23 Feb 2026). The design reports speedups up to $t$ 7 over baseline software implementations with only a $t$ 8 hardware overhead relative to baseline CVA6, and its instruction-level TVLA stays below the standard $t$ 9 threshold (Srivastava et al., 23 Feb 2026). The paper is explicit that the strongest claim is empirical first-order leakage suppression rather than a formal glitch-resistant proof (Srivastava et al., 23 Feb 2026).

5. Verification methodologies and formal assessment at scale

Hardware masking verification has moved from small gadgets to production arithmetic modules. The survey catalogs formal tools such as MaskVerif, VerMI, TightPROVE, REBECCA, and SILVER, reflecting the historical evolution from probing-based proofs to implementation-aware checks with glitches and composability conditions (Covic et al., 2021). Recent work on PQC hardware introduces a four-stage hierarchy for large arithmetic datapaths: $a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 0 The first stage uses dependency queries

$a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 1

$a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 2

and treats a wire as structurally safe if it depends on at most one share group (Iskander et al., 16 Apr 2026).

On the 1.17-million-cell Adams Bridge ML-DSA/ML-KEM accelerator, structural analysis completes in seconds across all 30 masked submodules, and a multi-cycle extension reclassifies 12 modules from structurally clean to structurally flagged (Iskander et al., 16 Apr 2026). For the 5,543-cell ML-KEM Barrett reduction module, the full pipeline verifies 198 of 363 structurally flagged wires as first-order secure, reports 165 as candidate insecure, and leaves 0 indeterminate; Z3 and CVC5 agree on all $a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 3 arithmetic SADC queries with 0 disagreements (Iskander et al., 16 Apr 2026). The paper presents this as a sound upper bound on the insecure set rather than an exact classification.

Another recent direction is mixed-domain simulation from HDL. “aLEAKator” combines concrete simulation, symbolic expressions, LeakSets, and stability information to verify masked hardware accelerators and masked software running on CPUs under value, transition, glitch, and robust-but-relaxed 1-probing models (Amiot et al., 8 Dec 2025). The framework models circuits as Mealy machines,

$a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 4

and derives verification obligations such as

$a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 5

for transition leakage and

$a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 6

for glitch leakage (Amiot et al., 8 Dec 2025). It verifies, among other case studies, a full first-order masked AES on various CPUs from HDL descriptions and correlates formal findings with ChipWhisperer measurements (Amiot et al., 8 Dec 2025). A plausible implication is that hardware masking verification is becoming architecture-aware rather than gadget-only.

6. Practical limits, controversies, and broader extensions

The strongest caveat in the literature is that masking proofs remain conditional on physical observability assumptions. “Real-World Snapshots vs. Theory” demonstrates Laser Logic State Imaging, a backside optical technique that can extract the logical state of all registers at an arbitrary clock cycle with a single measurement, effectively providing an “unlimited number of contactless probes” (Krachenfels et al., 2020). The paper’s point is not that the algebra of masking is wrong, but that the bounded-observation assumption of the $a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 7-probing model can fail in practice (Krachenfels et al., 2020). This sharply separates mathematical validity from physical adequacy.

A second practical controversy concerns FPGAs and aggressive parallelism. In a study of ML-KEM Fujisaki–Okamoto verification, unprotected, hash-based, and higher-order masked implementations are evaluated on both a microcontroller and an FPGA, and the higher-order masked FPGA designs still leak information about the underlying data because of hardware-level effects and data-dependent processing (Ranney et al., 30 Jun 2026). The paper reports that parallelized processing on FPGAs introduces sufficient first-order leakage for full secret-key recovery, even though the compared higher-order masked comparison is proven $a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 8-probing secure and the microcontroller implementation does not leak up to order $a = a_1 \oplus a_2 \oplus \cdots \oplus a_n.$ 9 under the intended assumptions (Ranney et al., 30 Jun 2026). This suggests that formal masking order is not a reliable proxy for physical leakage in highly parallel programmable logic.

A third controversy concerns partial masking in PQC accelerators. In Adams Bridge, only the first INTT layer is protected with first-order Boolean masking using a DOM-style two-share implementation, while the remaining layers are defended mainly by Random Start Index shuffling (Iskander et al., 4 Apr 2026). RTL analysis shows that RSI provides 6 bits of entropy per layer, $\mathbb{Z}_q$ 0, rather than the $\mathbb{Z}_q$ 1 bits of a full random permutation over 64 items (Iskander et al., 4 Apr 2026). The same paper argues that observation topology, not count, determines recovery in full-graph BP attacks on the INTT factor graph, and recommends strategically masking 3 consecutive mid-layers as a practical compromise (Iskander et al., 4 Apr 2026). This suggests that partial masking must be audited against factor-graph inference rather than only against per-butterfly CPA arguments.

Although the field is centered on cryptographic hardware, masking has also been transferred to machine-learning accelerators. “BoMaNet” presents a fully masked neural-network inference engine using first-order Boolean masking, secure hardware primitives for all linear and non-linear operations, a pipelined Trichina-style AND for improved glitch resistance, and a throughput of one masked addition per cycle; its implementation on a Xilinx Spartan-6 reports $\mathbb{Z}_q$ 2 latency overhead, $\mathbb{Z}_q$ 3 area overhead, and security validation with 2M traces (Dubey et al., 2020). “MaskedNet” and “Guarding Machine Learning Hardware Against Physical Side-Channel Attacks” adapt masking to BNN inference engines, including masked adders, masked activation, and masked output-layer comparison, with reported overheads of $\mathbb{Z}_q$ 4 latency and $\mathbb{Z}_q$ 5 area in one design and area-delay overheads ranging from $\mathbb{Z}_q$ 6 to $\mathbb{Z}_q$ 7 in another, together with first-order security over millions of power traces and shuffling to impede straightforward second-order attacks (Dubey et al., 2019, Dubey et al., 2021).

Taken together, these results portray hardware masking as a mature but conditional discipline. Its formal core now includes arithmetic composition theorems over $\mathbb{Z}_q$ 8, scalable RTL verification pipelines, and architecture-aware mixed-domain analyses. At the same time, real observability, FPGA parallelism, synthesis transformations, and incomplete mask renewal remain persistent fault lines. The cumulative evidence suggests that effective hardware masking is not a single technique but a coupled methodology: share algebra, gadget discipline, register placement, fresh randomness, stage-wise composability, and verification under leakage models that remain physically meaningful.