MOAT Architecture: Multi-Domain Designs
- MOAT Architecture is a multi-domain concept integrating efficient vision models, secure DRAM, safe BPF isolation, astrophysical moat flows, and advanced NoC topologies.
- In vision systems, MOAT blocks merge mobile convolution with self-attention, achieving high accuracy and efficiency across classification and segmentation tasks.
- Across hardware and astrophysics, MOAT designs enforce robust security, mitigate risks like Rowhammer, and optimize resource use while maintaining empirical performance benefits.
MOAT architecture refers to several unrelated but influential technical designs across distinct research domains. The term appears in advanced neural network architectures, secure hardware memory subsystems, in-kernel program isolation, computational astrophysics, and network-on-chip interconnects. This article surveys each MOAT architecture per domain, summarizing core principles, mathematical structure, and empirical results, strictly referencing the factual literature base for each instance.
1. MOAT in Vision Models: Mobile Convolution and Attention
The MOAT architecture in vision models refers to a block that merges Mobile Convolution (MBConv, typically inverted residual structures from MobileNetV2) and Transformer Self-Attention to yield parameter- and compute-efficient models with state-of-the-art accuracy on classification, detection, and segmentation tasks (Yang et al., 2022).
The core block has the following structure:
- BatchNorm → MBConv (no SE) → Self-Attention → Residual Add.
- The MBConv block: conv (expand ), BN/GeLU, depthwise conv (stride or $2$), BN/GeLU, conv (project ). No Squeeze-and-Excitation.
- The output of MBConv is passed (post-BN/GeLU) to MultiHeadSelfAttention; this is pointwise residual-connected to the input.
The equation for the core block is: where , , are as above.
Design Variants and Results
MOAT is instantiated as a sequence of five hierarchical stages, using stacked MBConv and MOAT blocks. Model sizes are scaled by the channel widths and depth of each stage—with "tiny-MOAT" scaling down to 3M parameters.
| Variant | Params | ImageNet-1K Top-1 | COCO AP | ADE20k mIoU |
|---|---|---|---|---|
| MOAT-1 | 41.6M | 84.2% | - | - |
| MOAT-3 | 190M | 85.3% | 59.2 | 57.2 |
| tiny-MOAT | 3–24M | 83.3% (MOAT-0) | 55.2 (3) | 47.5 (3) |
For dense tasks (COCO/ADE20k), global attention is replaced with non-overlapping windowed self-attention within stages 2–4, and no window-shifting is required: cross-window interactions are handled by the depthwise convolution. This eliminates the computational cost of global self-attention for large input resolutions.
Ablations demonstrate that placing MBConv before Attn and inside the residual gives up to 0.4% higher accuracy compared to stacking Attn then MBConv or using simple MLPs. Downsampling is most accuracy/parameter-efficient when performed within the MBConv block rather than with separable pooling or convolution layers (Yang et al., 2022).
2. MOAT for Rowhammer Mitigation: Secure DRAM Architecture
MOAT ("Mitigating Rowhammer with Dual Thresholds") is a provably secure in-DRAM mitigation design for Rowhammer attacks, leveraging per-row activation counters (PRAC) and the DDR5 ALERT-Back-Off (ABO) protocol (Qureshi et al., 2024). It extends the JEDEC DDR5 PRAC+ABO framework with dual internal thresholds and minimal memory-controller SRAM cost (7 bytes per bank).
Hardware Design
- Per-Row Activation Counters (PRAC): Every DRAM row includes a counter, incremented on every ACT→PRE, stored physically with the data bits. tPRE is lengthened to accommodate inlined counter updates, while tRC is preserved.
- ALERT-Back-Off (ABO): DRAM asserts ALERT to the controller, which in response can finish in-flight commands (180ns), then must pause commands during an RFM (350ns). JEDEC Mitigation Level determines the number of RFMs and forced minimum inter-ALERT activations.
| Component | Description |
|---|---|
| CTA (3 bytes) | Tracks row_addr+counter of current at-risk row |
| CMA (2 bytes) | Current row being proactively mitigated in next refresh window |
| SRAM overhead | 7 bytes/bank (CTA, CMA, refresh-replica counters) |
Thresholds and Operation
- Eligibility Threshold (ETH): Rows with counter are never scheduled for proactive victim refresh, reducing unnecessary overhead.
- ALERT Threshold (ATH): When a row’s counter reaches , an ALERT is triggered, forcing the controller to run RFM and stop subsequent activations to that row.
Default settings: , .
Security Bound
- No delayed ALERT: Maximum safe RH = .
- Accounting for ABO delay: Safe Rowhammer threshold , where (Mitigation Level), and the pool size of attackable rows in the window (see (Qureshi et al., 2024), Eq. A.4). With , : MOAT tolerates .
Performance
- Average slowdown is for , negligible for .
- ALERT rate is of refresh intervals at .
- DRAM activation energy overhead , total DRAM energy impact .
- Lower results in prohibitive ALERT frequency and slowdown.
MOAT’s design prevents attack patterns that exploit consecutive ALERT allowances by tracking only one row per bank and backing up counters across refresh-resets, preventing “straddling” attacks permitted by prior work (Panopticon).
3. MOAT for Safe BPF Kernel Extension
MOAT in the context of the Linux kernel is a hardware-enforced in-kernel isolation architecture for untrusted BPF programs (Lu et al., 2023). It utilizes Intel’s Memory Protection Keys for Supervisor (PKS) and process-context ID (PCID) for isolation at two layers:
- Layer I: Memory domains for kernel, BPF program, and shared objects via PKS keys.
- Layer II: Each BPF program receives a unique address space (CR3), mapped with a distinct PCID to achieve TLB isolation, with no-flush context switches.
Critical objects (e.g., map-ops function pointers) are protected via Critical Object Protection (COP), and helper calls are verified at runtime via Dynamic Parameter Auditing (DPA), where argument ranges checked statically by the verifier are enforced by the JIT compiler at each helper invocation.
Prototype implementation adds 3 KLOC to Linux 6.1.38, with overheads of for socket filtering and XDP, averaging for system tracing (UnixBench). Compared to SandBPF, MOAT achieves comparable or better performance at lower overhead, re-enabling safe, unprivileged BPF extension without requiring a perfect static verifier.
4. Moat Flow Architecture in Sunspot Physics
The moat flow architecture refers to the large-scale overturning convective cell that forms around sunspots, identified in high-resolution MHD simulations (Rempel, 2015). The photospheric moat flow is the radial outflow, typically at –m/s, seen within , where Mm.
Key features:
- Strong suppression of the downflow filling factor under the penumbra (factor ), setting up a net upflow that converts to horizontal photospheric outflow (the moat).
- Moat cell architecture: upflow under the penumbra, horizontal moat outflow, and return downflow near .
- Moat outflow is more extended and robust around spots with a penumbra.
- Magnetically, inhibited downflows prevent submergence of horizontal field components near the spot, stabilizing sunspot flux and inhibiting decay.
- Naked spots (without penumbra) exhibit a weaker moat, faster decay (Mx/day), and less suppression of .
The moat is an MHD consequence of spot-induced suppression of convective downflows; the architecture controls both the stability of the spot and the pattern of near-surface, observable flows.
5. Diametrical Mesh-of-Tree (D2D-MoT) Network-on-Chip
In network-on-chip (NoC) research, the MOAT architecture refers to the Mesh-of-Tree (MoT) and its Diametrical 2D Mesh-of-Tree (D2D-MoT) extension (Ghosal et al., 2012). D2D-MoT combines MoT's low-degree/locality with added diametrical links to minimize network diameter and maximize bisection bandwidth.
| Topology | Diameter | Bisection Width | Node Degree |
|---|---|---|---|
| 2D Mesh | $3,4,5$ (corner, boundary, int) | ||
| MoT | (leaf, stem, root) | ||
| D2D-MoT | $5$ (leaf), $3$ (stem/root) |
Routing proceeds deterministically up the tree, across a diametrical channel (if src/dst are in different row/col), then down the tree, guaranteeing shortest path and deadlock freedom. Empirical evaluation shows that D2D-MoT delivers $20$– latency reduction over MoT, $45$– over standard mesh for random traffic, and its wire count overhead approaches $0$ as network size increases.
6. Synthesis and Domain-Specific Significance
Across architectures, MOAT signifies "border," "isolation barrier," or "pathway structuring"—in computer vision, as a block enabling efficient local–global mixing; in DRAM, as a protocol to confine Rowhammer risk; in kernel memory, as logic/hardware separation; and in computational astrophysics, as the physical flow that confines and stabilizes localized magnetic flux.
The MOAT block in vision models and NoC architectures can be interpreted as topological or functional modularity imbuing systems with resource efficiency and safety. The DRAM and BPF MOAT architectures encode policy via hardware-enforced barriers managed at fine temporal and spatial resolutions. In all instances, empirical evaluations or proofs accompany formal design, with trade-offs explicit between security/robustness, performance, and implementation cost.
7. References
- "MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models" (Yang et al., 2022)
- "MOAT: Securely Mitigating Rowhammer with Per-Row Activation Counters" (Qureshi et al., 2024)
- "MOAT: Towards Safe BPF Kernel Extension" (Lu et al., 2023)
- "Numerical simulations of sunspot decay: On the penumbra -- Evershed flow -- moat flow connection" (Rempel, 2015)
- "Diametrical Mesh Of Tree (D2D-MoT) Architecture: A Novel Routing Solution For NoC" (Ghosal et al., 2012)