Dual Modular Redundancy (DMR)

Updated 5 February 2026

Dual Modular Redundancy (DMR) is a fault-tolerance technique that duplicates computing modules to detect discrepancies and mitigate errors.
It uses spatial redundancy with real-time comparators to quickly flag mismatches and trigger recovery protocols such as rollback or output freezing.
Practical applications span spaceborne systems, cryptographic hardware, and quantum processors, balancing enhanced fault detection with hardware overheads.

Dual Modular Redundancy (DMR) is a spatial redundancy technique for fault tolerance in digital @@@@1@@@@. It operates by duplicating hardware or software modules and running them in parallel with input equivalence, using real-time comparator logic to detect discrepancies. DMR is applied extensively in domains requiring high reliability and safety, such as spaceborne computation, cryptographic hardware, high-performance computing, and quantum control platforms. By design, DMR enables immediate detection of transient and permanent errors, and supports forward or rollback recovery depending on the context and integration with auxiliary mechanisms.

1. Fundamental Principles and Reliability Model

In DMR, two identical modules—processors, pipelines, datapaths, or computational threads—are provided with identical inputs and expected to produce identical outputs. A bitwise comparator or checker evaluates their outputs after each operation or computational phase. If the outputs match, the result is deemed correct; a mismatch signals an error, triggering an appropriate recovery mechanism.

The canonical reliability improvement is derived from the assumption of statistically independent failure probabilities $p$ per module over the verification window (e.g., a computation tile, instruction, or pipeline stage). The probability of an undetected error (simultaneous failure) is $p^2$ , yielding DMR reliability: $R_{\mathrm{DMR}} = 1 - p^2$ For small $p$ , DMR approaches a 2× reduction in undetected-fault rate relative to single-module reliability ( $R_1 = 1-p$ ), explicated in both hardware and quantum system models (Tedeschi et al., 4 Feb 2026, Papadopoulos et al., 2024, Bhandari, 16 Sep 2025).

2. Microarchitectural and Hardware Implementations

DMR is realized through duplication of functional logic and output comparators. At the processor core or computational engine level, DMR implies full replication:

Processor/Microarchitecture: Each processing unit or entire core is duplicated and fed from a common fetch/instruction queue. At operation retirement, a comparator checks result equivalence. Immediate error detection on any mismatch is characteristic, as observed in ARMv8 core experiments yielding detection latencies on the order of 96 cycles and zero performance loss, at the expense of a substantial area (100%) and increased power (+43%) (Papadopoulos et al., 2024).
Accelerators: Safe-NEureka, a DNN accelerator for on-board satellite AI, splits a PE array into two identical sub-arrays executing the same computation tile. An XNOR-reduction checker compares results; mismatches trigger hardware-assisted rollback-and-retry from a local checkpoint (μ-loop) (Tedeschi et al., 4 Feb 2026).
Cryptographic Hardware: The HFS-box implements DMR in pipeline stages of an AES S-box. At each stage, parallel combinational logic outputs are compared, and a discrepancy freezes the pipeline outputs using voter logic until both replicas realign. This method corrects transient faults without rollback (Taheri et al., 2020).
Quantum Systems: Redundant routers (double-star topology) are instantiated in superconducting modular quantum processors, with real-time switchover between parallel routers via fast signaling upon fault detection. This not only yields survivability under failures but enables parallel quantum gate operations (Bhandari, 16 Sep 2025).

DMR can be synthesized and managed dynamically, as in hybrid clusters that provide on-demand split-locking to activate or deactivate DMR among multicore groups (Rogenmoser et al., 2023).

3. Fault Detection, Recovery, and Correction Mechanisms

The fault response in DMR depends on the system context:

Detection: Output comparators or checkers provide cycle-level detection of mismatches. In ASICs and CPUs, fault-injection studies demonstrate near-complete suppression of incorrect results (e.g., Safe-NEureka achieves a 96% reduction in faulty executions in redundancy mode) (Tedeschi et al., 4 Feb 2026).
Correction: DMR alone cannot provide unambiguous correction as in TMR; however, recovery can be layered:
- Forward recovery: In DMR-threaded software solvers such as TwinCG, if only one of two parallel computations is verified as healthy (via application-level invariants), its state overwrites the other, providing forward recovery (Dichev et al., 2016).
- Rollback-and-retry: Hardware accelerators like Safe-NEureka maintain microarchitectural checkpoints and re-issue the failed operation upon detection, ensuring deterministic and bounded recovery latency (Tedeschi et al., 4 Feb 2026).
- Output freezing: Cryptographic pipelines (e.g., HFS-box) freeze output registers, effectively masking a fault and waiting for self-realignment (Taheri et al., 2020).
- Rapid hardware state restore: In hybrid RISC-V clusters, ECC-protected backup registers enable state restoration within 24 cycles after error detection (Rogenmoser et al., 2023).
Coverage: DMR guarantees detection of single-module transient and permanent errors but can remain vulnerable to correlated (common-mode) faults unless mitigated (e.g., temporal offsets in Safe-NEureka prevent simultaneous glitches) (Tedeschi et al., 4 Feb 2026).

4. Quantitative Trade-offs: Area, Power, Performance

DMR is associated with significant hardware and energy overheads, which are highly system-dependent:

Context	Area Overhead	Performance Impact	Energy/Power Overhead
ARMv8 CPU core (Papadopoulos et al., 2024)	≈100% (2× baseline)	No slowdown (IPC unaltered)	+43% (measured)
Safe-NEureka DNN (Tedeschi et al., 4 Feb 2026)	+15% (IP-level)	–48% throughput; –53% efficiency	<+8% power (cluster)
HFS-box AES S-box (Taheri et al., 2020)	+137%	–11.3% throughput	Not quantified
RISC-V cluster (Rogenmoser et al., 2023)	+0.3%/8.4% (SW/HW)	53% of baseline MOPS (–47%)	Not stated
Quantum router (Bhandari, 16 Sep 2025)	+25% (router)	None (parallel mode); <10% switchover latency	-

In processor-centric applications, DMR is considered mandatory where absolute reliability is required and area/power budgets permit. In large-scale parallel systems and accelerators, the DMR cost can be amortized by enabling its use only during mission-critical or safety-relevant sections, with rapid mode switching supported by hardware control (Tedeschi et al., 4 Feb 2026, Rogenmoser et al., 2023).

5. Specializations and Applications

Spaceborne and Radiation-Hardened Systems: DMR is a critical feature in AI accelerators and CPU clusters for LEO satellites and planetary missions, enabling the detection and correction of radiation-induced soft errors and providing the option to revert to non-redundant, high-throughput operation for less-critical payload processing (Tedeschi et al., 4 Feb 2026, Rogenmoser et al., 2023).
High-Performance Iterative Solvers: In scientific computing, DMR enables robust forward recovery for iterative methods (e.g., conjugate gradient solvers) at minimal runtime cost. TwinCG demonstrates a 5–6% time overhead in the fault-free case with outperformance under fault injection relative to ABFT schemes (Dichev et al., 2016).
Cryptographic Cores: Real-time detection and masking of transient and malicious faults are achieved through pipelined DMR, with area/throughput trade-offs compared to TMR and time-triple redundancy (Taheri et al., 2020).
Quantum Computing Architectures: DMR is adapted to control-plane elements (quantum routers), providing survivability and enabling direct parallel execution of multi-qubit gates, with measured fidelity parity between single- and dual-router operation (Bhandari, 16 Sep 2025).

DMR is distinguished from Triple Modular Redundancy (TMR) and other fault-tolerance methods by its trade-off of reduced resource requirements versus correction capability:

TMR: Triple-replication adds a voting element enabling automatic correction without ambiguity but increases area and power by ≈3×. In DMR, ambiguity upon error detection requires either state rollback or protocol to identify the healthy copy before recovery (Dichev et al., 2016).
Redundant SMT (R-SMT)/ParDet: Microarchitectural methods incur lower area (e.g., 6–24%) and lower or moderate performance loss but have higher detection latencies and, in some cases, nonzero silent data corruption rates (Papadopoulos et al., 2024).
ABFT: Algorithm-based fault tolerance in linear algebra provides partial fault detection within specific kernels (e.g., SpMxV) but lacks DMR’s system-wide coverage (Dichev et al., 2016).

DMR is preferred under requirements of low-latency detection, absolute SDC coverage, and performance preservation, with the steeper hardware cost deemed acceptable in mission-critical and high-reliability deployments (Papadopoulos et al., 2024).

7. Practical Implications and Reconfigurability

Modern hybrid and adaptive architectures incorporate hardware and software mechanisms for on-demand DMR activation, enabling trade-offs between reliability, area/power, throughput, and recovery latency tailored to application phase or system mode. In Safe-NEureka, switching between performance and redundancy modes is accomplished by programming a memory-mapped register, incurring <400-cycle latency and requiring no API change; recovery from detected errors is deterministic and incurs bounded cost (Tedeschi et al., 4 Feb 2026). In multicore clusters, DMR can be enabled or disabled dynamically across core groups with minimal state transfer delay (Rogenmoser et al., 2023).

These adaptive strategies ensure that fault-tolerance overheads are localized to critical code regions or workloads, achieving near-optimal utilization, especially in bandwidth- or energy-constrained environments such as spaceborne AI accelerators. In quantum architectures, DMR-based structures directly enable otherwise unavailable computational throughput and resilience to catastrophic faults with only modest hardware complexity increases (Bhandari, 16 Sep 2025).