Kanva: Lock-Free Learned Search Framework
- Kanva is a framework for learned, lock-free search data structures that integrate piecewise-linear models with non-blocking synchronization.
- It organizes data into a shallow hierarchy of MNodes and dynamic bins, enabling rapid, unique, model-guided search paths.
- Evaluations show Kanva outperforms traditional structures like E-ABT and C-IST in throughput and cache efficiency under diverse workloads.
Kanva is a framework for learned, lock-free search data structures designed to deliver high scalability and strong progress guarantees on multi-core architectures. It integrates piecewise-linear learned models for rapid key prediction with provably non-blocking synchronization, achieving significant throughput and cache efficiency improvements over prior non-blocking search structures. Kanva’s core innovation lies in its shallow hierarchy of lightweight modelled nodes (MNodes) and dynamic, lock-free bins that combine learned arithmetic search with fully linearizable concurrency semantics (Bhardwaj et al., 2023).
1. Structural Design and Search Pathways
Kanva organizes the key space as a shallow, unbalanced tree comprising two node types:
- MNodes (“modelled nodes”): Internal nodes, each containing one or more tiny linear models to predict the subrange for a given search key.
- Bins: Leaf nodes, initially implemented as sorted linked lists (one-level), which absorb all insert, delete, and update operations. When overloaded, bins upgrade to two-level arrays of linked-list pages and, upon reaching a fixed threshold, are “frozen” and converted into new MNodes via lock-free retraining.
Each root MNode contains an array of piecewise-linear models approximating the cumulative distribution function (CDF) of the keys, associated split-points, and child pointers (some null-initialized). Non-root MNodes hold a sorted array keys[], parallel versioned-value lists versions[], a single linear model , and child pointers. Traversals proceed top-down, following a unique path per search from the root to a leaf bin, guided by the model predictions rather than comparator-driven tree walks. The absence of restructuring of ancestor nodes after installation guarantees unique traversals.
2. Learned Query Formulation and Model Fitting
At each modelled node, Kanva’s rank-prediction mechanism approximates the true rank-CDF of key over the sorted dataset of size by a fitted linear model: yielding a predicted rank
such that, for all in the segment,
where is the maximal regression error at training time.
Model parameters and are computed in one pass via standard linear regression statistics: with
In practice, this “fetch-and-add–free” scheme fits each node’s model within a few hundred nanoseconds even on 64-thread platforms, and is reported within 5–40 keys of optimum for real datasets.
3. Concurrency Architecture and Linearizability
Kanva ensures non-blocking progress, leveraging only single-word compare-and-swap (CAS) operations for all MNode and child updates. When a bin surpasses its capacity threshold , the first thread to observe this condition “freezes” it using atomic pointer tagging (bit-stealing), initiating a lock-free conversion process in which all threads encountering the frozen bin help promote it to an MNode. Reads (searches) never block and traverse the tree following model predictions without participating in conversion or helping.
Key invariants established by the design:
- Unique Path: Once installed, models and node structures are immutable, preserving a fixed traversal route from root to bin for each search key.
- No Data Loss: Each inserted key–value remains accessible via versioned-value lists, even through bin–MNode conversions.
- Range-Scan Consistency: Range queries utilize versioned timestamps, guaranteeing snapshot isolation for all values committed before query start.
For each operation, there is a precise linearization point: inserts and deletes linearize at successful CAS steps, searches at the highest versioned-value read with timestamp ≤ invocation time, and ranges at atomic timestamp reads. Following Herlihy’s argument, lock-freedom is assured as threads either complete their operation or help another; thus, some operation always completes in finite steps.
4. Core Algorithms
Kanva’s primary algorithms are summarized below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
function Seek(key, node):
ix ← searchInMNode(node, key) // exponential-search + model
if node.keys[ix] == key:
return (node, ix, FOUND)
child ← node.children[ix+1]
if child == null:
return (node, ix, NFOUND)
if child is Bin:
return (node, ix, MAYBE)
else:
return Seek(key, child)
procedure Insert(key, val):
retry:
(node, ix, st) ← Seek(key, root)
if st==FOUND:
return writeValue(node, ix, val)
elseif st==NFOUND:
newB ← Bin(key, val)
if CAS(node.children[ix], null, newB): return true
else goto retry
else: // MAYBE
b ← node.children[ix]
if b.size≥ B or b.frozen: helpConvert(node,ix,b); goto retry
res ← b.insertBin(key,val)
if res==HELP: helpConvert(node,ix,b); goto retry
return res
procedure Delete(key):
retry:
(node, ix, st) ← Seek(key, root)
if st==FOUND:
return writeValue(node, ix, null)
elseif st==NFOUND:
return false
else:
b ← node.children[ix]
res ← b.deleteBin(key)
if res==HELP: helpConvert(node,ix,b); goto retry
return res
function Search(key):
(node, ix, st) ← Seek(key, root)
if st==FOUND:
return readValue(node.versions[ix])
elseif st==NFOUND:
return null
else:
b ← node.children[ix]
entry ← searchBin(b, key)
return (entry.key==key) ? readValue(entry.version) : null
procedure helpConvert(node, ix, b):
(K[], V[]) ← freezeAndCollect(b)
(a,b,ε) ← fitLinearModel(K)
newM ← MNode(K, V, (a,b,ε))
CAS(node.children[ix], b, newM) |
Versioned-value arrays maintain singly-linked stacks of nodes, with a global atomic timestamp counter for range query snapshot isolation.
5. Comparative Performance Evaluation
Kanva was evaluated against leading lock-free search data structures, specifically C-IST (lock-free interpolation search tree) and E-ABT (elimination (a,b)-tree), as well as ALEX, LFABT, and FineDex, on workloads up to 200 million keys across 128 threads of a dual-socket 32-core AMD EPYC server.
Throughput results (read-heavy, 95% search):
| Threads | 8 | 16 | 32 | 64 | 128 |
|---|---|---|---|---|---|
| Kanva | 28.4 | 52.1 | 75.2 | 97.8 | 85.0 |
| E-ABT | 3.1 | 5.8 | 9.7 | 17.4 | 15.2 |
| C-IST | 1.4 | 2.6 | 4.1 | 6.2 | 5.8 |
- Under read-heavy workloads, Kanva delivers up to 100 million operations per second (Mops/sec) on 64 threads, approximately 10× greater throughput than E-ABT and 20× greater than C-IST.
- For update-heavy workloads (30% search, 50% insert, 20% delete), Kanva achieves 60–80 Mops/sec, about 4× over E-ABT and 7× over C-IST.
- These gains persist for skewed Zipfian, “Facebook,” “Amazon,” and “OSM” real datasets.
- Kanva incurs 20–30% fewer LLC misses than C-IST or LFABT, benefiting from its fatter internal nodes and zero need for tree rebalancing.
- YCSB workloads further show Kanva’s throughput exceeds E-ABT by 1.2–1.4× and C-IST by 2–3× for various read/write mixes, maintaining performance even under high “hot-spot” contention.
6. Implications and Context
Kanva’s integration of learned arithmetic search with lock-free, non-blocking updates offers both the average-case speed advantages of learned indexes and the worst-case progress guarantees of concurrent non-blocking search structures. By avoiding global locks or compare-based traversals, Kanva achieves a combination of high multi-core scalability, provable linearizability, and cache-friendly access that surpasses prior state-of-the-art approaches by up to one to two orders of magnitude on practical datasets (Bhardwaj et al., 2023). This suggests the feasibility of deploying learned methods in concurrent primitives without sacrificing correctness or performance guarantees. The approach establishes a general paradigm for non-blocking, model-based data structures capable of efficient concurrent access patterns.