GPUHammer: Rowhammer on GDDR6 GPUs

Updated 14 July 2025

GPUHammer is the first demonstration of Rowhammer attacks on discrete NVIDIA GPUs using GDDR6, exposing vulnerabilities in ML and high-performance workloads.
It employs precise timing analysis and multi-warp hammering kernels to reverse engineer memory row mappings and bypass proprietary refresh mitigations.
The attack induces bit-flips that can drastically degrade neural network accuracy, underscoring the urgent need for advanced DRAM protection mechanisms.

GPUHammer is the designation for the first practical demonstration of Rowhammer-style bit-flip attacks on discrete GPUs, specifically NVIDIA devices equipped with GDDR6 DRAM. The GPUHammer attack reveals that the Rowhammer vulnerability, previously a major concern for CPUs utilizing DDR/LPDDR memories, also affects GPU memory subsystems crucial to high-performance and ML workloads. This work is significant because it establishes both the feasibility of exploiting DRAM disturbance errors on modern GPU architectures and the potential for directly compromising the integrity and reliability of GPU-resident ML models (Lin et al., 10 Jul 2025).

1. Rowhammer Phenomenon and Its Significance on GPUs

The Rowhammer effect is a read-disturbance vulnerability whereby intensive, repeated accesses—"hammering"—to specific DRAM rows lead to charge leakage and eventual bit flips in neighboring rows. While the phenomenon is established in CPU-attached DDR/LPDDR chips, its practical realization on discrete GPUs has been stymied by architectural and implementation factors:

GPUs utilize GDDR6 (and sometimes HBM), which differ in timing properties compared to DDR modules.
GPU system software and firmware conceal physical-to-virtual address mappings and employ proprietary refresh and protection mechanisms.
GPUs serve as the backbone of ML inference, scientific computing, and cloud deployments, creating a risk vector whereby Rowhammer-induced faults could catastrophically impact model accuracy and system reliability.

GPUHammer demonstrates that by overcoming these hurdles, an attacker can induce reproducible and non-trivial bit-flips in GPU DRAM, with direct consequences for mission-critical ML models and applications.

2. Obstacles to Rowhammer Attacks in GPU Architectures

Compared to CPU-based platforms, implementing Rowhammer on GPUs presents several unique challenges:

Proprietary Memory Mapping: GPUs obscure the mapping between virtual (user-level) and physical DRAM addresses. This prevents the naïve selection of adjacent rows for hammering, a key component of successful attacks.
High Access Latency and Timing Constraints: GDDR6 exhibits access latencies on the order of 300 ns, much higher than typical CPU-attached DRAM. Additionally, GDDR6 implements faster refresh rates (~32 ms windows) than DDR—reducing the activation opportunities for an attacker within the Rowhammer window.
In-DRAM and Controller-Level Mitigations: GDDR6 devices integrate proprietary variants of Target Row Refresh (TRR), which monitor and proactively refresh neighboring rows upon the detection of excessive activations, presenting a moving target for attack patterns.
Lack of Low-Level Documentation: The lack of open hardware information, and non-standardized access to performance counters or physical address information, necessitates costly reversal and profiling efforts.

These factors combine to make traditional CPU-style Rowhammer attacks infeasible out-of-the-box for current GPUs.

3. GPUHammer Methodology: Reverse Engineering and Hammering Strategies

GPUHammer introduces novel technical approaches to overcome the aforementioned barriers:

Row Mapping Reverse Engineering:

The attack begins by constructing a virtual-to-physical row mapping table via timing analysis. By iterating over a large GPU memory allocation—e.g., 47GB of a 48GB A6000 board—the system measures access latencies to different 256-byte regions. Timing anomalies reveal row-buffer conflicts, allowing inference of which virtual segments correspond to DRAM rows and banks.

GPU-Specific Hammering Kernels:

To maximize row activation rates under high latency, GPUHammer deploys two main hammering strategies:

Multi-thread hammering: Several threads within a warp issue concurrent accesses to aggressor rows, utilizing GPU concurrency.
Multi-warp hammering: Aggressor addresses are distributed across multiple warps, engaging independent schedulers to approach hardware activation limits.

GPU-specific PTX instructions, such as the discard command, enable direct bypassing of on-chip caches (L2/SMEM), ensuring activations reach physical DRAM. In addition, the kernel design meticulously synchronizes memory accesses with DRAM refresh intervals by introducing delay instructions ("bubbles"), thereby circumventing TRR-like mitigations.

Many-sided Synchronized Patterns:

The attack employs multi-sided (multi-aggressor) patterns—hammering 17-24 rows adjacent to each target victim—to overwhelm the TRR protection. This is calibrated empirically: patterns with at least 17 aggressors consistently succeeded in flipping bits, suggesting an internal TRR sampler size of 16 rows per bank.

4. Experimental Results and Impact on Machine Learning Model Integrity

The GPUHammer attack achieved practical fault injection on an NVIDIA A6000 (GDDR6, 48GB):

Recorded eight unique bit-flips across four different DRAM banks, with each bank exhibiting at least one flip.
Flipped bits could be deterministically induced using many-sided, delay-synchronized hammering kernels.

The study highlighted pronounced security and reliability consequences for ML workloads:

A single bit-flip in the most significant exponent bit of a FP16 neural network weight can reduce model top-1 accuracy from ~80% to <1%—a Relative Accuracy Drop (RAD) approaching 0.99 as measured by

$\text{RAD} = \frac{\text{Acc}_{\text{pristine}} - \text{Acc}_{\text{corrupted}}}{\text{Acc}_{\text{pristine}}}$

This demonstrates the viability of integrity attacks, including tampering, stealthy data corruptions, and potential denial-of-service for GPU-based ML inference pipelines.

5. Technical Execution Details

Key technical details established by GPUHammer include:

Activation Rate Maximization: With a GDDR6 refresh period ( $t_\text{REFW}$ ) of 32ms and activation cycle ( $t_\text{RC}$ ) of 45ns, the theoretical maximum is approximately

$\frac{32 \times 10^{-3}\ \text{sec}}{45 \times 10^{-9}\ \text{sec}} \approx 700,000\ \text{activations}$

The multi-warp GPUHammer kernel achieved ~620K activations within this window.

DRAM Address Selection: The large memory allocation is partitioned into 256-byte chunks; timing analysis reveals which chunks alias to identical banks/rows, guiding aggressor/victim selection.
Kernel Synchronization: Delay-inserted arithmetic maintains proper alignment with internal DRAM refresh triggers, neutralizing certain in-DRAM mitigation schemes.

6. Defenses, Limitations, and Future Directions

Defenses:

Error-Correcting Code (ECC): Enabling ECC on GPU memory can substantially mitigate Rowhammer attacks, but with a 6.5% usable memory capacity reduction and an ML inference performance overhead of 3–10%.
Address Space Randomization: Dynamically randomizing virtual-to-physical memory mappings on the GPU would force adversaries to redo mapping profiling per allocation, raising the attack cost.
Allocator Strategies: Enhancing the randomness or introducing quarantine in memory allocators (e.g., as in RAPIDS Memory Manager) makes precise mapping difficult.
Advanced DRAM-level Protections: Adoption of refined Refresh Management (RFM) or on-die ECC could asynchronously or proactively mitigate even targeted attacks.

Limitations and Open Questions:

The documented attack currently applies to NVIDIA GDDR6 platforms. Variants on HBM or other GDDR architectures may exhibit different Rowhammer thresholds or mitigation strengths.
The flipping rate and attack scalability depend on the persistence and specific aggressor patterns—a consequence of proprietary controller logic.

Future Research:

Extending row mapping and hammering techniques to additional GPU architectures (including HBM generations) and exploring more generalized or adaptive hammering patterns.
Investigating online, automated profiling to enable or block Rowhammer-style mapping in dynamic, cloud/task-geographically fragmented environments.
Deriving formal models of DRAM controller behavior under synchronization stress to characterize emergent vulnerabilities.

7. Summary Table: GPUHammer Attack Attributes

Feature	Description	Observed Result
Row mapping method	Timing-based, 256-byte chunk granularity	Reconstructed mapping lookup
Maximum activation rate	Multi-warp, synchronized accesses with PTX cache bypass	~620K/32ms (theoretical: 700K)
Required aggressors per pattern	Empirical, ≥17 (TRR sampler size: 16 rows per bank per device)	17–24-sided patterns effective
Bit-flip outcome	8 unique flips, 4 DRAM banks	Each bank, ≥1 flip
Measured impact on FP16 ML model	Bit-flip in exponent, top-1 accuracy reduction ≈ 0.99 (RAD)	<1% accuracy (from ~80%)
Key mitigation	ECC (6.5% memory, 3–10% slowdown), memory mapping randomization	Substantial attack difficulty

8. Concluding Perspective

GPUHammer establishes that Rowhammer vulnerabilities now extend to modern GPU DRAM, including those underpinning cloud ML inference and high-performance computing. By bridging the technical gap through reverse engineering, cache bypass, and synchronized multi-warp hammering, GPUHammer not only delivers practical bit-flip attacks but also accentuates the urgent need for better DRAM protections and memory management in GPU-enabled platforms. These findings define a new security and reliability agenda for the broader GPU computing ecosystem (Lin et al., 10 Jul 2025).

PDF Markdown Chat (Pro)

References (1)

GPUHammer: Rowhammer Attacks on GPU Memories are Practical (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to GPUHammer.