Microarchitectural State Divergence

Updated 18 November 2025

Microarchitectural state divergence is a phenomenon where a CPU's internal states evolve differently, impacting timing and security.
It arises from factors like speculative execution, resource sharing, and non-determinism, with measurements via side-channel techniques.
This divergence underpins timing channels and side-channel vulnerabilities, spurring research into effective mitigation strategies and resilience.

Microarchitectural state divergence refers to the phenomenon in which a processor’s internal hardware state—observable exclusively within the microarchitecture, not the architecture specified by the ISA—evolves differently across otherwise architecturally equivalent program executions. This divergence encapsulates any scenario where the internal state of caches, buffers, predictors, or other microarchitectural elements depends on execution history, external perturbations, or system-level non-determinism, leading to observable discrepancies in timing, functionality, or side-effects. Such divergences are directly responsible for a broad spectrum of timing channels, side-channel vulnerabilities, security failures, and opportunities for architectural resilience. Microarchitectural state divergence is a central concept in contemporary microarchitecture, performance analysis, and security research, as evidenced by systematic studies across replicated execution (Okech et al., 2015), state management optimizations (Stecklina et al., 2018), automated side-channel discovery (Weber et al., 2021), and covert-channel mitigation (Ge et al., 2016, Wistoff et al., 2020, Li et al., 15 Feb 2025).

1. Formal Definitions and Taxonomy

The fundamental definition traces to the separation between architecturally visible state $A(t)$ (registers and memory states as prescribed by the ISA) and microarchitectural state $M(t)$ (internal buffers, predictors, caches, etc.). Microarchitectural state divergence occurs at time $t$ when there exists a predicate $\varphi$ on $M(t)$ such that $\varphi$ cannot be inferred from $A(t)$ but can be influenced by program execution (Minkin et al., 2019). Divergence can be deterministic (path variability, input-dependent state) or probabilistic (due to randomization, exceptions, or hardware noise).

Formally, for two executions starting at the same initial state $\sigma_0 \in \Sigma$ (the microarchitectural state-space), running program $P$ on inputs $D$ , $D'$ :

$\sigma_1 = \mathrm{ExecCPU}(P, D, \sigma_0)$

$\sigma_2 = \mathrm{ExecCPU}(P, D', \sigma_0)$

Microarchitectural state divergence is present if $\sigma_1 \neq \sigma_2$ (Oleksenko et al., 2021). In side-channel contexts, divergence is characterized via statistical information metrics, such as mutual information $I(X;Y)$ and channel capacity $C = \max_{P(X)} I(X;Y)$ , where $X$ is the attacker's input and $Y$ the observer's timing or side-channel measurement (Ge et al., 2016).

2. Sources and Mechanisms of Divergence

Microarchitectural divergence arises from (1) transient execution and speculation (out-of-order and speculative effects not visible architecturally), (2) context-switch and scheduling non-determinism, (3) hidden buffers and predictors, (4) incomplete hardware flushes, and (5) resource-sharing among cores.

Hidden Hardware State: Caches (L1, L2, L3), TLBs, branch predictors (BTB, BHB, PHT), prefetchers, performance monitoring units (PMUs), FPU/SIMD register files, store/load buffers, mesh credit and buffer occupancies (Ge et al., 2016, Minkin et al., 2019, Wan et al., 2021).
Speculative Paths: Transient microarchitectural state divergence caused by unresolved faults or mispredictions (e.g., pre-fault store buffer forwarding, speculative cache line fills) (Stecklina et al., 2018, Minkin et al., 2019, Oleksenko et al., 2021).
External Non-determinism: OS scheduling, interrupt arrival, hardware events, and inherent unpredictability in kernel routines (Okech et al., 2015).

Divergent states can be manipulated by malicious software (Trojan–spy protocols, gadgets, and Prime+Probe/Flush+Reload), observed by design/fault-injection (OS instrumentation or PMU), or induced by benign concurrency and system software.

3. Measurement, Quantification, and Automated Discovery

Divergence is measured and quantified by direct side-channel probing, differential timing experiments, statistical estimation of information leakage, and formal testing frameworks.

Experimental Protocols: Replicated execution, kernel-level tracing, and task-level path-length analysis reveal statistical and distributional divergence (e.g. per-syscall path-lengths, KL-divergence of distributions) (Okech et al., 2015).
Covert/Side-Channel Probing: Prime+Probe and Flush+Reload methods quantify divergence in cache state and measure covert channel capacity (e.g., multiple bits per context switch, variable between architectural generations) (Ge et al., 2016, Wistoff et al., 2020).
PMU-based Event Analysis: Operand-sensitive event counts, as in PMU-Data, reveal divergence not only in caches but in micro-op pipelines and arithmetic units—providing direct measurement of operand-dependent microarchitectural state (Li et al., 15 Feb 2025).
Automated Frameworks: Osiris fuzzes the entire instruction space to empirically detect timing-based divergence in hardware state, clustering divergences by resource and exploiting them as side-channels (single-shot accuracy, SNR, throughput) (Weber et al., 2021).

4. Security Implications: Timing Channels, Side-Channels, and Transient Execution Attacks

Microarchitectural state divergence is the root cause of almost all timing and side-channel attacks, including but not limited to:

Covert Channels: Any incompletely partitioned or flushed resource can be used to encode and communicate secret bits between adversarial processes or tenants (bandwidths over 1 kbit/s widely observed) (Ge et al., 2016, Wan et al., 2021, Wistoff et al., 2020).
Transient/Speculative Execution Vulnerabilities: Attacks such as Spectre, Meltdown, Fallout, LazyFP, and their variants exploit speculation-induced state divergence to exfiltrate secrets through transient side-effects (Stecklina et al., 2018, Minkin et al., 2019, Oleksenko et al., 2021).
Operand-Dependent Leakage: PMU-Data demonstrates that secret-dependent microarchitectural event counts (e.g., in DIVIDER_ACTIVE, L2_RQST, L1D_PEND_MISS) can encode kernel data or TEE secrets, even if all programmably visible control flow is constant (Li et al., 15 Feb 2025).
Mesh Interconnect Timing Channels: Volcano shows that congestion in stateful mesh interconnects exposes divergence observable by remote timing, defeating both spatial (CAT/AWAYs) and temporal (time-slice) isolation (Wan et al., 2021).

Residual state divergence persists despite ISA-mandated flushes, partitioning, and system-level mitigations. Modern CPUs lack effective mechanisms to comprehensively eliminate all sources.

5. Quantitative Observations and Case Studies

Divergence is quantified via channel capacity, timing deltas, SNR, and throughput; security experiments and measurement illustrate both ubiquity and severity of divergence.

Kernel Path Divergence: In Okech et al., dual-replica tasks exhibited a mean divergence $D̄ \approx 130$ (out of only 5/1,000 iterations with zero divergence), highlighting the unpredictability even in tightly controlled code (Okech et al., 2015).
Unmitigated Channels: Intel Skylake L1-D cache exhibits a capacity $C_{none} = 4.0$ b per symbol (vs. $C_{full} = 0.038$ b after partial mitigation), with significant residuals in BTB, BHB, TLB, and instruction prefetch (Ge et al., 2016).
PMU-based Leakage: DIVIDER_ACTIVE and DEMAND_DATA_RD_MISS channels support up to 1,707 B/s with $0\%$ error; SNR values of 10–60 are routine under realistic noise (Li et al., 15 Feb 2025).
Automated Discovery SNRs: Osiris revealed timing gaps of 90–228 cycles, with single-shot accuracy $>99\%$ for MMX–x87/AVX2–x87 and $>75\%$ for XSAVE/RDRAND resources (Weber et al., 2021).
Mesh Interconnect: Volcano observed mesh link delay increases of $+200$ to $+470$ cycles (up to 49.8 dB SNR), supporting cross-tile RSA key leakage at sub-1% bit error (Wan et al., 2021).
Covert Channel Rates: RISC-V Ariane channels showed 1,667–3,770 mb per switch (1.6–3.7 b), robust against noise without flush; a new flush instruction ( $\mathrm{fence.t}$ ) drops channel capacity below noise (Wistoff et al., 2020).

Channel Type	Throughput (bits/s)	Error Rate	Hardware
L1D Prime+Probe	up to 833	<1%	RISC-V Ariane
PMU-Data DIV/MOV	1,707–12,440	0%	Intel Skylake/Kaby Lake
Osiris RDRAND	95–1,000	<1%	Intel/AMD
Volcano Mesh	200,000 bps equiv	<1%	Intel Xeon Scalable

6. Defenses and Mitigation Strategies

Complete elimination of microarchitectural state divergence requires fine-grained and architecturally complete hardware flush or partition mechanisms. Key strategies include:

Full-state Flush Instructions: Custom instructions (e.g., RISC-V $\mathrm{fence.t}$ ) reset all observable microarchitectural elements atomically, proven necessary and sufficient for closing five major covert channels on Ariane (Wistoff et al., 2020).
Partitioning and Tagging: Hardware partitioning (e.g., CAT/AWAYs for caches), per-domain tag coloring, and fine-grained resource allocation reduce but rarely eliminate divergence (ineffective for mesh, predictor, and instruction prefetch channels) (Wan et al., 2021, Ge et al., 2016).
Eager Context Switching: Immediate context save/restore (e.g., FPU/SIMD eager context switching) eliminates the window for LazyFP-style leakage, at modest performance overhead (Stecklina et al., 2018).
Buffer Management: Enforced drains (mfence/dmb) on store/load buffers at security boundaries prevent unintentional exposure of privileged data (Minkin et al., 2019).
Restricting PMU Access: Limiting or rolling back user-mode PMU access in speculative contexts blocks PMU-Data leakage (Li et al., 15 Feb 2025).
Higher-layer Mitigation: Where architectural changes are infeasible, use constant-time libraries, schedule randomization, noise injection, or physical core isolation for cross-domain protection (Ge et al., 2016).

Residual vulnerabilities remain on current mainstream CPUs, especially in instruction prefetchers, branch predictors, TLBs, and mesh fabrics.

7. Applications: Beyond Vulnerability—Resilience and Verification

While most literature emphasizes the adversarial implications of microarchitectural state divergence, inherent execution-path non-determinism may offer resilience benefits:

Fault Detection and Diversity: Okech et al. demonstrate that naturally arising divergence in replicated execution can amplify detection of rare OS faults—by ensuring that systematic bugs on infrequently executed code paths are unlikely to be triggered simultaneously across replicas (Okech et al., 2015).
Third-party Contract Verification: Frameworks such as Revizor leverage divergence analysis for testing contract compliance in black-box CPUs, surfacing both deviations and vulnerabilities at scale (Oleksenko et al., 2021).
Side-channel Discovery: Automated frameworks (e.g., Osiris) exploit divergence measurement to enumerate previously unknown side-channels, facilitating architectural evaluation and verification (Weber et al., 2021).

Such approaches may inform the design of resilient, diagnosable, and contract-enforced architectures, provided sufficient observability and controllability over microarchitectural state transitions.

In sum, microarchitectural state divergence is an intrinsic property of modern CPUs, underpinning core principles of execution unpredictability, hardware sharing, security vulnerability, and (in some contexts) system resilience. The state-of-the-art shows divergence is ubiquitous, quantifiable, and only partially mediable under current architectural paradigms; full elimination remains an open and actively pursued challenge in computer architecture and security (Okech et al., 2015, Stecklina et al., 2018, Minkin et al., 2019, Ge et al., 2016, Wistoff et al., 2020, Wan et al., 2021, Li et al., 15 Feb 2025, Weber et al., 2021, Oleksenko et al., 2021).