MOAT Architecture: Multi-Domain Designs

Updated 28 January 2026

MOAT Architecture is a multi-domain concept integrating efficient vision models, secure DRAM, safe BPF isolation, astrophysical moat flows, and advanced NoC topologies.
In vision systems, MOAT blocks merge mobile convolution with self-attention, achieving high accuracy and efficiency across classification and segmentation tasks.
Across hardware and astrophysics, MOAT designs enforce robust security, mitigate risks like Rowhammer, and optimize resource use while maintaining empirical performance benefits.

MOAT architecture refers to several unrelated but influential technical designs across distinct research domains. The term appears in advanced neural network architectures, secure hardware memory subsystems, in-kernel program isolation, computational astrophysics, and network-on-chip interconnects. This article surveys each MOAT architecture per domain, summarizing core principles, mathematical structure, and empirical results, strictly referencing the factual literature base for each instance.

1. MOAT in Vision Models: Mobile Convolution and Attention

The MOAT architecture in vision models refers to a block that merges Mobile Convolution (MBConv, typically inverted residual structures from MobileNetV2) and Transformer Self-Attention to yield parameter- and compute-efficient models with state-of-the-art accuracy on classification, detection, and segmentation tasks (Yang et al., 2022).

The core block has the following structure:

BatchNorm → MBConv (no SE) → Self-Attention → Residual Add.
The MBConv block: $1\times1$ conv (expand $C\rightarrow 4C$ ), BN/GeLU, $3\times3$ depthwise conv (stride $s = 1$ or $2$), BN/GeLU, $1\times1$ conv (project $4C \rightarrow C$ ). No Squeeze-and-Excitation.
The output of MBConv is passed (post-BN/GeLU) to MultiHeadSelfAttention; this is pointwise residual-connected to the input.

The equation for the core block is: $\text{MOAT}(X) = X + \mathrm{Attn}\bigl[N_2(D(N_1(\mathrm{BN}(X))))\bigr]$ where $N_1$ , $D$ , $N_2$ are as above.

Design Variants and Results

MOAT is instantiated as a sequence of five hierarchical stages, using stacked MBConv and MOAT blocks. Model sizes are scaled by the channel widths and depth of each stage—with "tiny-MOAT" scaling down to $\sim$ 3M parameters.

Variant	Params	ImageNet-1K Top-1	COCO AP $^\text{box}$	ADE20k mIoU
MOAT-1	41.6M	84.2%	-	-
MOAT-3	190M	85.3%	59.2	57.2
tiny-MOAT	3–24M	83.3% (MOAT-0)	55.2 (3)	47.5 (3)

For dense tasks (COCO/ADE20k), global attention is replaced with non-overlapping windowed self-attention within stages 2–4, and no window-shifting is required: cross-window interactions are handled by the depthwise convolution. This eliminates the computational cost of global self-attention for large input resolutions.

Ablations demonstrate that placing MBConv before Attn and inside the residual gives up to 0.4% higher accuracy compared to stacking Attn then MBConv or using simple MLPs. Downsampling is most accuracy/parameter-efficient when performed within the MBConv block rather than with separable pooling or convolution layers (Yang et al., 2022).

2. MOAT for Rowhammer Mitigation: Secure DRAM Architecture

MOAT ("Mitigating Rowhammer with Dual Thresholds") is a provably secure in-DRAM mitigation design for Rowhammer attacks, leveraging per-row activation counters (PRAC) and the DDR5 ALERT-Back-Off (ABO) protocol (Qureshi et al., 2024). It extends the JEDEC DDR5 PRAC+ABO framework with dual internal thresholds and minimal memory-controller SRAM cost (7 bytes per bank).

Hardware Design

Per-Row Activation Counters (PRAC): Every DRAM row includes a counter, incremented on every ACT→PRE, stored physically with the data bits. tPRE is lengthened to accommodate inlined counter updates, while tRC is preserved.
ALERT-Back-Off (ABO): DRAM asserts ALERT to the controller, which in response can finish in-flight commands (180ns), then must pause commands during an RFM (350ns). JEDEC Mitigation Level determines the number of RFMs and forced minimum inter-ALERT activations.

Component	Description
CTA (3 bytes)	Tracks row_addr+counter of current at-risk row
CMA (2 bytes)	Current row being proactively mitigated in next refresh window
SRAM overhead	7 bytes/bank (CTA, CMA, refresh-replica counters)

Thresholds and Operation

Eligibility Threshold (ETH): Rows with counter $\text{< ETH}$ are never scheduled for proactive victim refresh, reducing unnecessary overhead.
ALERT Threshold (ATH): When a row’s counter reaches $\text{ATH}$ , an ALERT is triggered, forcing the controller to run RFM and stop subsequent activations to that row.

Default settings: $\text{ATH}=64$ , $\text{ETH}=32$ .

Security Bound

No delayed ALERT: Maximum safe RH = $\text{ATH}+2$ .
Accounting for ABO delay: Safe Rowhammer threshold $T_{\text{RHSafe}} = \text{ATH} + \log_{M/3}(N_c) + M$ , where $M=3+L$ (Mitigation Level), and $N_c$ the pool size of attackable rows in the window (see (Qureshi et al., 2024), Eq. A.4). With $\text{ATH}=64$ , $L=1$ : MOAT tolerates $T\approx 99$ .

Performance

Average slowdown is $0.28\%$ for $\text{ATH}=64$ , negligible for $\text{ATH}=128$ .
ALERT rate is $2.3\%$ of refresh intervals at $\text{ATH}=64$ .
DRAM activation energy overhead $+2.3\%$ , total DRAM energy impact $<0.5\%$ .
Lower $\text{ATH}<50$ results in prohibitive ALERT frequency and $>10\%$ slowdown.

MOAT’s design prevents attack patterns that exploit consecutive ALERT allowances by tracking only one row per bank and backing up counters across refresh-resets, preventing “straddling” attacks permitted by prior work (Panopticon).

3. MOAT for Safe BPF Kernel Extension

MOAT in the context of the Linux kernel is a hardware-enforced in-kernel isolation architecture for untrusted BPF programs (Lu et al., 2023). It utilizes Intel’s Memory Protection Keys for Supervisor (PKS) and process-context ID (PCID) for isolation at two layers:

Layer I: Memory domains for kernel, BPF program, and shared objects via PKS keys.
Layer II: Each BPF program receives a unique address space (CR3), mapped with a distinct PCID to achieve TLB isolation, with no-flush context switches.

Critical objects (e.g., map-ops function pointers) are protected via Critical Object Protection (COP), and helper calls are verified at runtime via Dynamic Parameter Auditing (DPA), where argument ranges checked statically by the verifier are enforced by the JIT compiler at each helper invocation.

Prototype implementation adds $\sim$ 3 KLOC to Linux 6.1.38, with overheads of $<1\%$ for socket filtering and XDP, averaging $5.5\%$ for system tracing (UnixBench). Compared to SandBPF, MOAT achieves comparable or better performance at lower overhead, re-enabling safe, unprivileged BPF extension without requiring a perfect static verifier.

4. Moat Flow Architecture in Sunspot Physics

The moat flow architecture refers to the large-scale overturning convective cell that forms around sunspots, identified in high-resolution MHD simulations (Rempel, 2015). The photospheric moat flow is the radial outflow, typically at $v_m \simeq 200$ – $400\,$ m/s, seen within $R_{\text{spot}} < r < R_{\text{flow}}$ , where $R_{\text{flow}} \sim R_{\text{spot}} + 10\,$ Mm.

Key features:

Strong suppression of the downflow filling factor $f_{-}(r,z)$ under the penumbra (factor $\lesssim 12$ ), setting up a net upflow that converts to horizontal photospheric outflow (the moat).
Moat cell architecture: upflow under the penumbra, horizontal moat outflow, and return downflow near $R_{\text{flow}}$ .
Moat outflow is more extended and robust around spots with a penumbra.
Magnetically, inhibited downflows prevent submergence of horizontal field components near the spot, stabilizing sunspot flux and inhibiting decay.
Naked spots (without penumbra) exhibit a weaker moat, faster decay ( $d\Phi/dt \sim -10^{21}\,$ Mx/day), and less suppression of $f_{-}$ .

The moat is an MHD consequence of spot-induced suppression of convective downflows; the architecture controls both the stability of the spot and the pattern of near-surface, observable flows.

5. Diametrical Mesh-of-Tree (D2D-MoT) Network-on-Chip

In network-on-chip (NoC) research, the MOAT architecture refers to the Mesh-of-Tree (MoT) and its Diametrical 2D Mesh-of-Tree (D2D-MoT) extension (Ghosal et al., 2012). D2D-MoT combines MoT's low-degree/locality with added diametrical links to minimize network diameter and maximize bisection bandwidth.

Topology	Diameter	Bisection Width	Node Degree
2D Mesh	$M+N-2$	$\min(M,N)$	$3,4,5$ (corner, boundary, int)
MoT	$2\log_2 M + 2\log_2 N$	$\min(M,N)$	$2,3,\geq3$ (leaf, stem, root)
D2D-MoT	$\log_2 M + \log_2 N$	$2\min(M,N)$	$5$ (leaf), $3$ (stem/root)

Routing proceeds deterministically up the tree, across a diametrical channel (if src/dst are in different row/col), then down the tree, guaranteeing shortest path and deadlock freedom. Empirical evaluation shows that D2D-MoT delivers $20$– $35\%$ latency reduction over MoT, $45$– $60\%$ over standard mesh for random traffic, and its wire count overhead approaches $0$ as network size increases.

6. Synthesis and Domain-Specific Significance

Across architectures, MOAT signifies "border," "isolation barrier," or "pathway structuring"—in computer vision, as a block enabling efficient local–global mixing; in DRAM, as a protocol to confine Rowhammer risk; in kernel memory, as logic/hardware separation; and in computational astrophysics, as the physical flow that confines and stabilizes localized magnetic flux.

The MOAT block in vision models and NoC architectures can be interpreted as topological or functional modularity imbuing systems with resource efficiency and safety. The DRAM and BPF MOAT architectures encode policy via hardware-enforced barriers managed at fine temporal and spatial resolutions. In all instances, empirical evaluations or proofs accompany formal design, with trade-offs explicit between security/robustness, performance, and implementation cost.

7. References

"MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models" (Yang et al., 2022)
"MOAT: Securely Mitigating Rowhammer with Per-Row Activation Counters" (Qureshi et al., 2024)
"MOAT: Towards Safe BPF Kernel Extension" (Lu et al., 2023)
"Numerical simulations of sunspot decay: On the penumbra -- Evershed flow -- moat flow connection" (Rempel, 2015)
"Diametrical Mesh Of Tree (D2D-MoT) Architecture: A Novel Routing Solution For NoC" (Ghosal et al., 2012)