EMOMA: Early Rejection via Lookup Table
- Early Rejection via Lookup Table is a technique that employs an on-chip counting block Bloom filter to rapidly eliminate non-existent keys before costly off-chip accesses.
- It integrates a two-choice cuckoo hash table with a small on-chip stash, ensuring each lookup performs at most one off-chip access for high throughput.
- The design maintains an invariant by relocating keys creating false positives, achieving high load efficiency (≈95%) and predictable performance even in dense environments.
Early rejection via lookup table refers to a mechanism that enables rapid elimination of non-existent keys in a lookup structure before off-chip memory accesses are performed, thus improving both the performance and efficiency of large-scale key-value storage systems. The EMOMA (Exact Match in One Memory Access) data structure operationalizes this principle using an on-chip counting block Bloom filter (CBBF) as a pre-filter, coupled with a two-choice cuckoo hash table in off-chip memory. This architecture ensures every lookup for a given key is resolved with at most one off-chip memory access—frequently rejecting absent keys without incurring such an access at all—by using the lookup table to disambiguate candidate memory locations and to guide early termination of unsuccessful searches (Pontarelli et al., 2017).
1. System Architecture and Lookup Path
EMOMA’s design segregates responsibilities between its on-chip and off-chip components:
- On-chip elements: These include the counting block Bloom filter (CBBF) and a small “stash,” a content-addressable memory (CAM) for temporarily holding elements during insertions or overflow situations.
- Off-chip elements: The primary storage is a two-choice cuckoo hash table of buckets, each containing slots, accessed via two hash functions and .
The CBBF is structured as blocks of bits each, with each block corresponding to a bucket in the cuckoo table and indexed by . The stash, of bounded size , ensures progress in insertions even amidst rare displacement failures. CBBF counters to support deletions or updates may be kept off-chip but do not participate in the fast-path lookup.
2. Data Organization and On-Chip CBBF
In EMOMA, each key-value pair is mapped to two potential buckets in : and . The CBBF is arranged with one block per bucket, within which positions are determined by bit-selection hash functions .
Organization Table
| Component | Location | Function |
|---|---|---|
| Cuckoo | Off-chip | Main key-value storage |
| CBBF | On-chip | Directs lookups, early rejects |
| Stash | On-chip | Assists insertions/failures |
All present keys are stored in exactly one slot in ; the CBBF is configured and maintained during insertions so that it directs retrieval with a single decisive bit-vector answer (Pontarelli et al., 2017).
3. Lookup Procedure and Early Rejection Mechanism
The lookup starts by checking the stash for the queried key. If not present, the CBBF is queried using to retrieve a block and test whether all corresponding bits are set. The outcome determines which bucket to probe in :
- CBBF negative: The key cannot be in ; is checked.
- CBBF positive: The key may reside in (including possible false positives).
Only the indicated bucket is read off-chip. If the key is found, its value is returned; otherwise, a miss is reported. The CBBF’s configuration avoids situations where multiple off-chip accesses would be necessary. Non-existent keys are frequently rejected entirely on-chip if the CBBF is negative, achieving “early rejection” (Pontarelli et al., 2017).
4. Insertion Procedure and Invariant Maintenance
The insertion strategy in EMOMA is designed to uphold the invariant:
“Any key that would false-positive in the CBBF if placed in is forced into .”
The steps are as follows:
- The new element is placed in the stash.
- The CBBF is checked at . If positive, is inserted into and the appropriate CBBF bits are set; if negative and no false positives would be created for existing entries, it is inserted into ; otherwise, it is placed in .
- If the target bucket is full, an eviction and re-insertion chain is initiated, using the stash as intermediate storage, until all keys are placed or the stash overflows.
By always moving CBBF-conflicting keys to , EMOMA ensures the one-bit bucket decision remains correct for all lookups (Pontarelli et al., 2017).
5. Probabilistic Analysis and CBBF Parameters
The false positive rate of the CBBF follows a standard Bloom filter bound: where is the number of entries, the number of buckets, and the number of hash functions. With typical parameter choices ( and or 4), remains a few percent. Crucially, EMOMA's invariant ensures any false positive in the CBBF does not result in a misdirected lookup but rather at most a single off-chip access with a correct miss result. If the CBBF is negative and the key was not inserted, early rejection occurs entirely on-chip (Pontarelli et al., 2017).
6. Comparative Evaluation with Alternative Schemes
EMOMA’s effectiveness is benchmarked against classical cuckoo hashing and hybrid approaches (cuckoo plus Bloom filter):
- Off-chip accesses: EMOMA guarantees exactly one for every lookup (hit or miss); standard cuckoo can require up to two, averaging at high load; traditional Bloom filter hybrids average , with a worst case of two.
- Throughput: Given single-access per query, EMOMA enables pipelined designs—e.g., 50 ns DRAM latency supports ≈20 M lookups/s per port without stalls.
- Hardware resource usage: The CBBF requires ≈4 bits per entry with simple shift/XOR logic, compared to >30 bits per entry and greater complexity for counting Bloom filter designs in pre-filtered Cuckoo table architectures.
Empirical studies with millions of entries show EMOMA achieves ≈95% load, average 1.0 off-chip accesses per lookup, stash sizes <20, and maintains its efficiency across high-load scenarios (Pontarelli et al., 2017).
7. Insertion Costs and Limitations
Insertion in EMOMA proceeds in an expected constant number of relocations for load factors up to ≈0.95, slightly lower than the ≈0.97 threshold conventional for classical cuckoo hashing. At 95% load, average insertion chains involve ≈30–50 relocations, and excess keys temporarily overflow to the small stash.
The need to “lock” certain keys into based on CBBF-induced conflicts incurs some complexity in insertion logic, but this does not impact the single-access guarantee or the early rejection property of the lookup path (Pontarelli et al., 2017).
A plausible implication is that for applications with extremely high insertion-to-lookup ratios, EMOMA’s more involved insertion procedure may be less suitable, but for most packet processing workloads, where lookups dominate, the benefits for throughput and predictability are decisive.
For further details and empirical benchmarking, see "EMOMA: Exact Match in One Memory Access" (Pontarelli et al., 2017).