Papers
Topics
Authors
Recent
2000 character limit reached

StickySampling: Streaming Frequency Estimation

Updated 16 November 2025
  • StickySampling is a streaming algorithm that approximates item frequency counts in high-speed data streams with provable one-sided additive error guarantees.
  • It employs a decreasing Bernoulli sampling probability combined with periodic counter decay to achieve logarithmic space complexity relative to the failure probability.
  • The algorithm is applied in security-critical contexts such as DRAM RowHammer mitigation, ensuring efficient detection of hammer rows without false positives.

StickySampling is a streaming algorithm designed to maintain approximate frequency counts for items in a high-speed data stream, providing one-sided additive error guarantees using space that is logarithmic in the failure probability. Originally formulated for data streams by Manku and Motwani, StickySampling achieves strong probabilistic security and performance trade-offs, making it particularly suitable for security-critical systems such as DRAM RowHammer mitigation, where it enables the detection of “hammer” rows with provable guarantees.

1. Problem Statement and Formal Definition

The StickySampling algorithm addresses the problem of tracking item (e.g., memory row) frequencies over a potentially unbounded data stream SS of length NN. For each unique element aa, let real(a)\operatorname{real}(a) denote the true frequency and est(a)\operatorname{est}(a) the estimate reported by the algorithm. The algorithm maintains a data structure CC with the following property:

real(a)ϵNest(a)real(a),with probability at least 1δ\operatorname{real}(a) - \epsilon N \leq \operatorname{est}(a) \leq \operatorname{real}(a),\quad \text{with probability at least } 1-\delta

where ϵ(0,1)\epsilon \in (0,1) is the permissible additive error fraction and δ(0,1)\delta \in (0,1) is the failure probability on the upper bound. In practice, parameters are tuned so that ϵN\epsilon N is a small fraction of a relevant system threshold (such as RowHammer, RH), and δ\delta is the tolerable false-negative rate.

Key parameters:

  • ϵ\epsilon: additive error fraction (e.g., set so ϵNRH/4\epsilon N \approx \mathrm{RH}/4 in DRAM applications)
  • δ\delta: failure probability
  • t=1ϵln12ϵδt=\left\lceil \frac{1}{\epsilon}\ln \frac{1}{2\epsilon\delta} \right\rceil: support-width constant, controlling counter compression frequency
  • window_width=2t\text{window\_width}=2t: updates before each compression and halving of sampling probability

StickySampling combines a geometrically decreasing Bernoulli sampling probability with periodical counter decay (“Compress”) to bound the number of stored counters.

2. Algorithm: Annotated Pseudocode

A hardware-friendly version of the StickySampling algorithm maintains accuracy and memory efficiency:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
procedure StickySampling(ε, δ)
  processed ← 0
  t ← ceil((1/ε) * ln(1/(2εδ)))
  window_width ← 2 * t
  P_sample ← 1.0      // initial sampling probability
  C ← empty map<row_address → count>

  for each activation S[i] do
    processed ← processed + 1

    // UPDATE: Frequency Count Maintenance
    if S[i] in C then
      C[S[i]] ← C[S[i]] + 1
    else
      r ← uniform_random(0,1)
      if r ≤ P_sample then
        C[S[i]] ← 1
      end if
    end if

    // COMPRESS: Counter Decay and Window Update
    if processed = window_width then
      for each x in C do
        tails ← 0
        repeat
          if coin_flip() == tails then
            tails ← tails + 1
        until coin_flip() == heads
        C[x] ← C[x] - tails
        if C[x] ≤ 0 then
          remove x from C
      end for

      window_width ← 2 * window_width
      P_sample ← P_sample / 2
      processed ← 0
    end if
  end for

  return C // at any time, C[x] is est(x)
This structure admits new items with a decreasing probability, ensuring rare items are dropped over time. The Compress step uses geometric decay to cap state and avoids linear growth over the stream.

3. Accuracy and Space Complexity Guarantees

Let NN be the total number of processed items. Under the specified parameters:

  • The counter table maintains at most M=1ϵln12ϵδM = \left\lceil \frac{1}{\epsilon} \ln \frac{1}{2\epsilon\delta} \right\rceil entries.
  • For any row address aa:

    • Deterministic lower bound: est(a)real(a)\operatorname{est}(a) \leq \operatorname{real}(a).
    • Probabilistic upper bound:

    Pr[real(a)ϵNest(a)real(a)]1δ.\Pr[\operatorname{real}(a) - \epsilon N \leq \operatorname{est}(a) \leq \operatorname{real}(a)] \geq 1 - \delta.

Space usage is thus

CO(1ϵln1ϵδ)|C| \leq O\left(\frac{1}{\epsilon}\ln\frac{1}{\epsilon\delta}\right)

with each entry recording a row address and its partial count.

4. Security Guarantees for RowHammer Mitigation

For DRAM RowHammer detection, “critical” rows (potential aggressors) are defined as real(a)>RH\operatorname{real}(a) > \mathrm{RH} within a refresh window. To guarantee detection,

  • Set ϵ\epsilon so that ϵN<RH\epsilon N < \mathrm{RH}.
  • Any row with real(a)RH+ϵN\operatorname{real}(a) \geq \mathrm{RH} + \epsilon N will have est(a)RH\operatorname{est}(a) \geq \mathrm{RH} with probability at least 1δ1-\delta, triggering mitigation.
  • No row with real(a)<RHϵN\operatorname{real}(a) < \mathrm{RH} - \epsilon N will be falsely reported: no false positives. With this, all rows exceeding the hammer threshold are detected and mitigated with high confidence before causing victim bitflips.

5. Comparison with Reservoir Sampling and Lossy Counting

Algorithm Space Complexity Error Profile
Reservoir Sampling O(k)O(k) (kN/RHk \sim N/\mathrm{RH}) Probabilistic (detection by sampling)
Lossy Counting O(1ϵlog(ϵN))O\left(\frac{1}{\epsilon}\log(\epsilon N)\right) One-sided, deterministic lower bound
StickySampling O(1ϵlog1ϵδ)O\left(\frac{1}{\epsilon}\log\frac{1}{\epsilon\delta}\right) One-sided additive ϵN\epsilon N error with failure δ\leq \delta

Reservoir Sampling provides uniform sampling but does not yield frequency counts, and is relatively inefficient for high security as kk scales steeply. Lossy Counting provides one-sided error but its counter state grows with log(ϵN)\log(\epsilon N). StickySampling achieves similar error guarantees to Lossy Counting, with superior scaling—its state does not depend on the total stream length NN, only on ϵ\epsilon and δ\delta.

6. Parameter Selection and Practical Deployment in DRAM Controllers

In practical DRAM systems with tREFW=32\text{tREFW} = 32 ms, tRC=48\text{tRC} = 48 ns, and worst-case Nmax666,000N_{\max} \sim 666,000 activations per window, set RH=4,000\mathrm{RH} = 4,000 (RowHammer threshold). To ensure ϵNmax=RH/4\epsilon N_{\max} = \mathrm{RH}/4, select ϵ=1.5×103\epsilon = 1.5 \times 10^{-3}. With δ=103\delta = 10^{-3}, this yields:

  • t=(1/ϵ)ln(1/(2ϵδ))8,466t = \left\lceil (1/\epsilon) \ln(1/(2\epsilon\delta)) \right\rceil \approx 8,466
  • window_width=2t17,000\text{window\_width} = 2t \approx 17,000 activations

The resulting counter table holds 8,466\leq 8,466 entries. After every $17,000$ activations, the Compress step is triggered, halving PsampleP_\text{sample} and doubling the next window. Such resource demands are moderate relative to DRAM controller capabilities and allow designer-controlled trade-offs by tuning ϵ\epsilon and δ\delta; lowering ϵ\epsilon increases tracking fidelity but raises memory usage, while decreasing δ\delta reduces false negatives with only logarithmic space cost.

7. Significance and Applicability

StickySampling introduces a novel combination of provable one-sided additive error (ϵN\epsilon N) and logarithmic-in-1/δ1/\delta space, enabled by the geometric decay and window-doubling mechanism. It is the first streaming method to provide these guarantees within the domain of architectural RowHammer defenses, ensuring the detection of all rows surpassing the hammer threshold with tunable confidence while avoiding false positives. The algorithm’s balanced security-performance trade-off surpasses both pure sampling and deterministic bucket-based schemes for this class of memory security problems. Practitioners should select ϵ=(RH/4)/Nmax\epsilon = (\mathrm{RH}/4)/N_{\max} and δ\delta to match system-level false-negative requirements, thereby right-sizing counter table, update window, and sampling probability to ensure resilient mitigation against aggressive RowHammer attacks.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to StickySampling.