RadixGraph: Dynamic In-Memory Graph
- RadixGraph is a dynamic in-memory graph system that employs a space-optimized radix tree (SORT) for efficient vertex indexing and supports millions of concurrent operations.
- It uses a hybrid snapshot–log architecture to manage edge storage, enabling rapid edge updates and low-latency query processing.
- Empirical results show RadixGraph delivers up to 16x higher update throughput and 40% memory savings, highlighting its scalability and efficiency for dynamic workloads.
RadixGraph is a fully in-memory, dynamic graph data structure designed for high-throughput, space-efficient storage and updating of large-scale dynamic graphs. Its architecture is centered on two core innovations: a space-optimized canonical radix tree—SORT—for vertex indexing, and a hybrid snapshot–log layout per vertex for edge storage, which together enable fast vertex and edge updates, scalable concurrency, and compact memory usage. RadixGraph targets dynamic graph workloads in which both query latency and update throughput are critical, supporting millions of concurrent operations per second while achieving substantial memory reductions versus existing systems (Xie et al., 4 Jan 2026).
1. Formal Model and Components
A RadixGraph is maintained via two primary tables alongside specialized data structures:
- Vertex Table (VT): An extensible array of size that holds, for each vertex , a unique ID, associated metadata, and a pointer to an adjacency (edge) array.
- SORT (Space-OPTimized Radix Tree): An x-ary radix tree mapping vertex IDs (arbitrary, possibly non-contiguous 64-bit integers) to their corresponding byte offsets in VT.
- Edge Array per Vertex (): For each , an array of capacity , partitioned into a read-only snapshot segment (consolidated neighbor list) and a write-only log segment for incremental updates.
This organization enables efficient implementations of graph mutator and query operations while minimizing space overhead.
2. SORT: Space-Optimized Radix Tree for Vertex Indexing
SORT is a canonical -layer radix tree where each layer (for ) splits the incoming vertex ID using a fan-out exponent . Each internal SORT node maintains a pointer array of entries, and leaf entries map directly to VT offsets. The assignment is determined by an offline dynamic programming optimizer, minimizing expected pointer-array space subject to , where for keyspace .
2.1 Algorithmic Operations
Insertion, search, and deletion require time and operate by segmenting the input ID’s binary representation into substrings of lengths . Brief pseudocode for insertion is as follows:
1 2 3 4 5 6 7 8 9 10 |
function InsertVertex(node N, int depth, bits v_id_bits):
seg ← top a_depth bits of v_id_bits
if N.children[seg] == NULL:
if depth < l-1:
N.children[seg] ← new internal node with 2^{a_{depth+1}} slots
else:
allocate new VT entry at offset off
N.children[seg] ← off // leaf pointer
return
recurse on N.children[seg] with remaining bits |
Returns “not found” if any pointer is uninitialized during lookup. Deletion involves marking the corresponding VT entry with an MVCC deletion timestamp and recycling its offset via a lock-free freelist.
2.2 Space Analysis
The expected space for SORT is given by:
where is the fan-out exponent at layer . A closed-form solution under uniform key distribution leads to the integer program:
with . The optimizer solves this in time and space, yielding in practice an memory profile except in pathologically sparse ID cases.
3. Hybrid Snapshot–Log Architecture for Edge Storage
In RadixGraph, every vertex’s adjacency list is realized as a composite array for . The first entries comprise the snapshot segment , capturing the compacted, immutable neighbor set; the next form the write-log , which accumulates insertions, deletions, and updates as tuples . When , a compaction phase merges into a new snapshot and resets .
3.1 Edge Update and Neighbor Scan
Edge insertions, deletions, and weight updates are all append-only into and performed via atomic increments of the edge array size. Compactions acquire a per-vertex latch only as necessary. Neighbor-list queries perform a backward scan, outputting the latest valid entry per neighbor not deleted as of the snapshot time. The following pseudocode formalizes insertion:
1 2 3 4 5 6 7 |
InsertEdge(u→v, w, t):
off_u ← SORT.lookup(u)
off_v ← SORT.lookup_or_insert(v)
idx ← atomic_fetch_add(EA_u.Size, 1)
EA_u[idx] ← (off_v, w, t)
if idx+1 == EA_u.Capacity/2:
compact(EA_u) |
Amortized update cost is established as due to the bounded compaction cost over the lifespan of edge log insertions.
3.2 Complexity Guarantees
- Insert, Delete, Update (Edge): Amortized per operation.
- Get Neighbors: , where is vertex degree.
- Vertex operations (via SORT): , where is the ID space.
4. Space and Performance Characteristics
Empirical and analytical results demonstrate the following properties:
- Update Throughput: Up to higher than the highest-performing baseline on the twitter-2010 dataset.
- Memory Efficiency: Achieves an average reduction in memory usage relative to the closest competing graph store.
- Analytic Query Speed: Delivers up to faster 2-hop queries and faster BFS/SSSP operations.
- Concurrent Scalability: Maintains stable latency under intense update and query loads, achieving linear scaling for multi-version concurrency control (MVCC).
- Total Space: , where is the edge count and the vertex count, comprising:
- SORT: (practically, except for extreme ID sparsity)
- VT: bytes plus freelist overhead
- Edges: $20m$ bytes ($8m$ for snapshot + $12m$ for log entries)
- Duplicate checker: bytes (for threads, bitmap segment size )
| Component | Practical Memory Usage | Asymptotic Bound |
|---|---|---|
| SORT | (worst-case) | |
| Vertex Table | bytes | |
| Edge Storage | $20m$ bytes total | |
| Duplicate Check | — |
5. Implementation and Concurrency Design
RadixGraph is realized using modern concurrency primitives and open-source libraries:
- Intel TBB concurrent_vector powers VT and SORT for efficient, thread-safe segment-doubling.
- ROWEX-style atomic bitmaps enable lock-free concurrent reads with CAS-synchronized writes.
- Per-node and per-vertex latching are reserved for infrequent compactions; read operations require no lock acquisition.
- Multi-version edge arrays create a singly linked version chain for snapshot queries at timestamp , supporting both read-committed and snapshot isolation levels in MVCC.
- Source code and technical documentation are publicly available at [https://github.com/ForwardStar/RadixGraph].
6. Limitations and Open Challenges
Several avenues for improvement and extension are identified:
- Transactional Semantics: Only MVCC with read-committed and snapshot isolation is provided; fully serializable transactions are not yet supported.
- Adaptivity to Skewed ID Distributions: Enhancements are possible via more localized re-optimization of SORT parameters under non-uniform ID assignment.
- Edge Array Deletion Overhead: Work remains on log-size tuning and space reclamation strategies for delete-heavy workloads.
- Persistent and Tiered Storage: Integration of an on-disk or hybrid storage tier for scaling beyond main memory is under exploration.
A plausible implication is that further adaptation of SORT to heterogeneous workload characteristics, along with deeper integration into tiered or distributed systems, could extend RadixGraph’s applicability to new domains within large-scale dynamic graph management (Xie et al., 4 Jan 2026).