Papers
Topics
Authors
Recent
Search
2000 character limit reached

EMOMA: Early Rejection via Lookup Table

Updated 2 January 2026
  • Early Rejection via Lookup Table is a technique that employs an on-chip counting block Bloom filter to rapidly eliminate non-existent keys before costly off-chip accesses.
  • It integrates a two-choice cuckoo hash table with a small on-chip stash, ensuring each lookup performs at most one off-chip access for high throughput.
  • The design maintains an invariant by relocating keys creating false positives, achieving high load efficiency (≈95%) and predictable performance even in dense environments.

Early rejection via lookup table refers to a mechanism that enables rapid elimination of non-existent keys in a lookup structure before off-chip memory accesses are performed, thus improving both the performance and efficiency of large-scale key-value storage systems. The EMOMA (Exact Match in One Memory Access) data structure operationalizes this principle using an on-chip counting block Bloom filter (CBBF) as a pre-filter, coupled with a two-choice cuckoo hash table in off-chip memory. This architecture ensures every lookup for a given key is resolved with at most one off-chip memory access—frequently rejecting absent keys without incurring such an access at all—by using the lookup table to disambiguate candidate memory locations and to guide early termination of unsuccessful searches (Pontarelli et al., 2017).

1. System Architecture and Lookup Path

EMOMA’s design segregates responsibilities between its on-chip and off-chip components:

  • On-chip elements: These include the counting block Bloom filter (CBBF) and a small “stash,” a content-addressable memory (CAM) for temporarily holding elements during insertions or overflow situations.
  • Off-chip elements: The primary storage is a two-choice cuckoo hash table TT of MM buckets, each containing bb slots, accessed via two hash functions h1h_1 and h2h_2.

The CBBF is structured as MM blocks of kk bits each, with each block corresponding to a bucket in the cuckoo table and indexed by h1(x)h_1(x). The stash, of bounded size ss, ensures progress in insertions even amidst rare displacement failures. CBBF counters to support deletions or updates may be kept off-chip but do not participate in the fast-path lookup.

2. Data Organization and On-Chip CBBF

In EMOMA, each key-value pair (x,vx)(x, v_x) is mapped to two potential buckets in TT: h1(x)h_1(x) and h2(x)h_2(x). The CBBF is arranged with one block per h1h_1 bucket, within which kk positions are determined by bit-selection hash functions g1(x),,gk(x)g_1(x),\dots,g_k(x).

Organization Table

Component Location Function
Cuckoo TT Off-chip Main key-value storage
CBBF On-chip Directs lookups, early rejects
Stash SS On-chip Assists insertions/failures

All present keys are stored in exactly one slot in TT; the CBBF is configured and maintained during insertions so that it directs retrieval with a single decisive bit-vector answer (Pontarelli et al., 2017).

3. Lookup Procedure and Early Rejection Mechanism

The lookup starts by checking the stash for the queried key. If not present, the CBBF is queried using h1(x)h_1(x) to retrieve a block and test whether all kk corresponding bits are set. The outcome determines which bucket to probe in TT:

  • CBBF negative: The key cannot be in T[h2(x)]T[h_2(x)]; T[h1(x)]T[h_1(x)] is checked.
  • CBBF positive: The key may reside in T[h2(x)]T[h_2(x)] (including possible false positives).

Only the indicated bucket is read off-chip. If the key is found, its value is returned; otherwise, a miss is reported. The CBBF’s configuration avoids situations where multiple off-chip accesses would be necessary. Non-existent keys are frequently rejected entirely on-chip if the CBBF is negative, achieving “early rejection” (Pontarelli et al., 2017).

4. Insertion Procedure and Invariant Maintenance

The insertion strategy in EMOMA is designed to uphold the invariant:

“Any key that would false-positive in the CBBF if placed in h1h_1 is forced into h2h_2.”

The steps are as follows:

  • The new element is placed in the stash.
  • The CBBF is checked at h1(x)h_1(x). If positive, (x,v)(x, v) is inserted into T[h2(x)]T[h_2(x)] and the appropriate CBBF bits are set; if negative and no false positives would be created for existing T[h1(x)]T[h_1(x)] entries, it is inserted into T[h1(x)]T[h_1(x)]; otherwise, it is placed in T[h2(x)]T[h_2(x)].
  • If the target bucket is full, an eviction and re-insertion chain is initiated, using the stash as intermediate storage, until all keys are placed or the stash overflows.

By always moving CBBF-conflicting keys to h2h_2, EMOMA ensures the one-bit bucket decision remains correct for all lookups (Pontarelli et al., 2017).

5. Probabilistic Analysis and CBBF Parameters

The false positive rate pfpp_{\rm fp} of the CBBF follows a standard Bloom filter bound: pfp(1ekn/M)kp_{\rm fp} \approx \left(1 - e^{-k n/M}\right)^k where nn is the number of entries, MM the number of buckets, and kk the number of hash functions. With typical parameter choices (n/M0.95n/M \leq 0.95 and k3k\approx3 or 4), pfpp_{\rm fp} remains a few percent. Crucially, EMOMA's invariant ensures any false positive in the CBBF does not result in a misdirected lookup but rather at most a single off-chip access with a correct miss result. If the CBBF is negative and the key was not inserted, early rejection occurs entirely on-chip (Pontarelli et al., 2017).

6. Comparative Evaluation with Alternative Schemes

EMOMA’s effectiveness is benchmarked against classical cuckoo hashing and hybrid approaches (cuckoo plus Bloom filter):

  • Off-chip accesses: EMOMA guarantees exactly one for every lookup (hit or miss); standard cuckoo can require up to two, averaging 1+pmiss(h1)1.11+p_{\rm miss}(h_1)\approx1.1 at high load; traditional Bloom filter hybrids average 1+pfp1.051+p_{\rm fp}\approx1.05, with a worst case of two.
  • Throughput: Given single-access per query, EMOMA enables pipelined designs—e.g., 50 ns DRAM latency supports ≈20 M lookups/s per port without stalls.
  • Hardware resource usage: The CBBF requires ≈4 bits per entry with simple shift/XOR logic, compared to >30 bits per entry and greater complexity for counting Bloom filter designs in pre-filtered Cuckoo table architectures.

Empirical studies with millions of entries show EMOMA achieves ≈95% load, average 1.0 off-chip accesses per lookup, stash sizes <20, and maintains its efficiency across high-load scenarios (Pontarelli et al., 2017).

7. Insertion Costs and Limitations

Insertion in EMOMA proceeds in an expected constant number of relocations for load factors up to ≈0.95, slightly lower than the ≈0.97 threshold conventional for classical cuckoo hashing. At 95% load, average insertion chains involve ≈30–50 relocations, and excess keys temporarily overflow to the small stash.

The need to “lock” certain keys into h2h_2 based on CBBF-induced conflicts incurs some complexity in insertion logic, but this does not impact the single-access guarantee or the early rejection property of the lookup path (Pontarelli et al., 2017).

A plausible implication is that for applications with extremely high insertion-to-lookup ratios, EMOMA’s more involved insertion procedure may be less suitable, but for most packet processing workloads, where lookups dominate, the benefits for throughput and predictability are decisive.


For further details and empirical benchmarking, see "EMOMA: Exact Match in One Memory Access" (Pontarelli et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Early Rejection via Lookup Table.