Hybrid Modular Redundant DNN Accelerator
- Hybrid Modular Redundant DNN Accelerator is a hardware design that uses dynamically reconfigurable modular redundancy to balance throughput, energy efficiency, and fault tolerance.
- It leverages dual and triple modular redundancy alongside ECC and adaptive mode switching to protect critical computations in safety-sensitive environments.
- Comparative analyses of Safe-NEureka, HyCA, and FORTALESA highlight significant fault tolerance improvements with modest area and performance overhead for mixed-criticality applications.
A Hybrid Modular Redundant Deep Neural Network (DNN) Accelerator is a category of hardware architecture for neural network inference that employs run-time reconfigurable modular redundancy, selectively activated in response to safety, reliability, and performance demands. This paradigm allows a single accelerator engine to adaptively trade throughput and energy efficiency for fine-grain fault tolerance. Hybrid approaches unify algorithm-level, datapath-level, and memory/controller-level protection and are especially suited for mission- or safety-critical domains such as on-board AI processing for satellites, automotive, and industrial control where both high reliability and high throughput are required but not simultaneously for every computation phase. Three major instantiations—Safe-NEureka (Tedeschi et al., 4 Feb 2026), HyCA (Liu et al., 2021), and FORTALESA (Cherezova et al., 6 Mar 2025)—define the contemporary HMR architecture space.
1. Microarchitectural Principles and Mode Partitioning
Hybrid modular redundancy in DNN accelerators is realized by structurally partitioning the compute fabric such that its processing elements (PEs) can be dynamically grouped into redundant or parallel submodules. The safe-NEureka engine partitions a baseline 4×4 PE array into two 4×2 subarrays; each subarray is served by an independent tiler ("µloop") and streamer, enabling two principal modes:
- Redundancy Mode (DMR): Both subarrays receive identical inputs and weights. The "shadow" subarray is temporally delayed by one cycle, and outputs are compared bit-wise each cycle. Any mismatch triggers a hardware-initiated rollback, transparently correcting the corrupted tile with sub-100-cycle latency.
- Performance Mode (Parallel): The subarrays operate independently, each computing distinct output tiles, maximizing parallel throughput via separate address generation and shared high-bandwidth data streaming.
This architectural motif is generalized in FORTALESA through an output-stationary N×N systolic array, which is at run time reconfigurable into 2-PE groups for DMR or 3/4-PE clusters for TMR via switchable buses and comparator/voter logic, driven by a 2-bit mode control. HyCA, in contrast, overlays a baseline 2-D DLA array with a flexible Dot-Production Processing Unit (DPPU) that transparently recomputes outputs of any faulty PE in-place, removing local spatial constraints on redundancy (Tedeschi et al., 4 Feb 2026, Cherezova et al., 6 Mar 2025, Liu et al., 2021).
2. Fault Tolerance Mechanisms: DMR, TMR, and ECC
HMR DNN accelerators combine several layers of fault tolerance:
- Dual Modular Redundancy (DMR): Each computation is performed in two redundant datapaths, with bit-wise output comparison. Errors initiate a rollback and replay of only the corrupted tile or output.
- Triple Modular Redundancy (TMR): Applied selectively to controller FSM and µloop logic, TMR triples control-state registers and uses majority voters, confining area overhead to segments that dominate reliability but not engine footprint.
- Error Correction Codes (ECC): All on-chip memory interfaces, notably streamers and TCDM interconnects, are protected via SEC-DED (e.g., Hsiao) ECC with error logging through memory-mapped software registers.
HyCA uses a software-programmable Fault-PE Table (FPT) and a DPPU to recompute faulty PE outputs transparently, with overhead only linear in the number of faults. FORTALESA implements both DMR and TMR via PE-grouping and comparative/voting logic, optionally alternating between performance, DMR, or TMR on a per-layer or per-inference basis, guided by analytical assessment of layer vulnerability and fault propagation (Tedeschi et al., 4 Feb 2026, Liu et al., 2021, Cherezova et al., 6 Mar 2025).
3. Formal Metrics: Area, Throughput, Fault Coverage, and Efficiency
A set of formal metrics quantifies trade-offs in HMR architectures.
Area Overhead:
Safe-NEureka (GF 12nm): mm², mm², (15%).
Fault Tolerance Improvement:
Safe-NEureka: –, , yielding reduction.
Throughput, Latency, Efficiency:
Safe-NEureka (3×3 dense conv): μs, μs (), μs (). In performance mode, throughput drops , efficiency by ; in redundancy mode, throughput drops , efficiency by .
Area and Power Comparison Table
| Architecture | Area Overhead | Power Overhead | Efficiency Loss |
|---|---|---|---|
| Safe-NEureka (perf) | 15% | 7.5% | 11% |
| Safe-NEureka (red) | 15% | 10.7% | 53% |
| HyCA (DPPU size 32) | 6% | — | — |
| FORTALESA (best) | 12–15% (static redundancy) / 6× reduction in area-pwr vs static TMR | — | — |
All HMR approaches achieve significant reduction in faulty executions compared to baseline or homogeneous redundancy schemes (Tedeschi et al., 4 Feb 2026, Cherezova et al., 6 Mar 2025, Liu et al., 2021).
4. Layer- and Mode-Selective Redundancy: Mixed-Criticality Support
HMR DNN accelerators support mixed-criticality operations by allowing run-time switching between redundancy and performance modes:
Use Cases:
- Non-critical, high-throughput: Run in performance mode for bulk data (e.g., cloud-detection in satellite imagery).
- Critical AI workloads: Switch to redundancy mode for GNC, collision avoidance, or other safety-critical phases.
Layer-level mapping is guided by the Architectural Vulnerability Factor (AVF), quantifying the fraction of faults that alter the DNN’s output. Pareto-front or constrained optimization is employed to assign DMR/TMR to only those layers with high AVF, maximizing reliability for minimal overhead (Cherezova et al., 6 Mar 2025). HMR can be applied to arbitrary DNN layers, including convolution (1×1, 3×3, depthwise), matrix-vector GEMMs, and even Transformer attention blocks by analogous engine partitioning.
5. Comparative Analysis: HyCA, FORTALESA, and Safe-NEureka
HyCA employs a DPPU to decouple redundancy from array topology, allowing full or partial recovery from both random and spatially clustered faults. It achieves ≳0.99 full-array reliability under 3% PE PER, whereas row/col/diag redundancy schemes degrade rapidly, especially under spatial clustering. Area overhead with 32 DPPU is ∼6%, with up to speedup over traditional row/col redundancy at moderate error rates. Runtime fault scanning is supported in cycles without dataflow interruption (Liu et al., 2021).
FORTALESA supports three run-time execution modes, four hardware configurations, and analytical fault-propagation vulnerability assessment to direct mode selection. Compared to static full-TMR, it delivers up to speedup and uses less area-power; versus prior selective-ECC it saves in resources while protecting both registers and MAC datapaths (Cherezova et al., 6 Mar 2025).
Safe-NEureka demonstrates a practical 4×4 (→2×4+2×4) HMR design for RISC-V clusters, integrating DMR, TMR, and ECC for comprehensive protection, supporting transparent rollback, and incurring only 15% area and 11% efficiency overhead in most modes (Tedeschi et al., 4 Feb 2026).
6. Extensibility, Optimizations, and Broader Impact
Recent HMR architectures suggest several optimization paths:
- Adaptive Mode Switching: Dynamically triggers DMR/TMR on detected fault-rate surges, using ECC logs or runtime status registers.
- Selective Redundancy: Applies redundancy only to the most critical layers or FSM controller segments.
- Algorithm-Based Fault Tolerance (ABFT): Complements hardware redundancy with matrix checksums for weights/data.
- Precision Scaling: Reduces data width (e.g., 4b, 2b) in high-throughput mode, maintaining TMR for critical phases only.
The impact of HMR DNN accelerators is most pronounced in LEO satellite, aerospace, and safety-critical terrestrial applications, as they provide both fault coverage and throughput flexibility not achievable with either static redundancy or pure software-based protection. The explicit isolation of redundancy-related overheads ensures that energy, silicon area, and execution latency remain optimized for the specific reliability and performance policy at any application instant (Tedeschi et al., 4 Feb 2026, Liu et al., 2021, Cherezova et al., 6 Mar 2025).
References:
- Safe-NEureka: Hybrid Modular Redundant DNN Accelerator for On-board Satellite AI Processing (Tedeschi et al., 4 Feb 2026)
- HyCA: A Hybrid Computing Architecture for Fault Tolerant Deep Learning (Liu et al., 2021)
- FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference (Cherezova et al., 6 Mar 2025)