Mergeable Dictionaries

Updated 5 September 2025

Mergeable dictionaries are abstract data types that maintain disjoint sets of totally ordered data with overlapping key ranges, supporting efficient predecessor searches, splits, and merges.
They employ extended biased skip lists augmented with finger operations and gap-based weighting to achieve O(log n) amortized performance, even when splits and merges interleave.
This advancement overcomes classical methods by enabling arbitrary merges with optimal costs, offering practical benefits for dynamic search, text compression, and adaptive indexing.

A mergeable dictionary is an abstract data type and supporting data structure that maintains a collection of disjoint sets of totally ordered data, allowing for efficient predecessor search, split, and—distinctively—merge operations without restrictions on key intervals, even when sets are arbitrarily interleaved. Unlike classical join or merge operations in balanced search trees that require sets to occupy disjoint key ranges, mergeable dictionaries support union of sets with overlapping keyspace at optimal amortized costs. The canonical realization employs extended biased skip lists augmented with finger operations and a refined gap-based potential function for optimal performance.

1. Data Structure Design and Principles

The fundamental structure for mergeable dictionaries is a balanced dictionary implemented using an extended biased skip list. Each node in such a skip list is given an integral height derived from a weight function that depends on the sizes of the gaps immediately adjacent to the node. Formally, if $x$ is the $k$ th element in set $S$ , and $g_S(k-1), g_S(k)$ are the gaps on either side, then the node's weight is

$w(x) = g_S(k - 1) + g_S(k)$

Gap sizes are defined as the number of keys between consecutive positions within the fixed global universe. The skip list is further enriched with "finger" operations—finger search, finger split, finger join, and finger reweight—that efficiently localize both search and structural updates.

A critical innovation is the introduction of a potential function aggregating logarithms of gap sizes:

$\Phi(D_i) = c \cdot \sum_{S \in \mathcal{S}^{(i)}} \phi(S)$

where

$\phi(S) = \sum_{x \in S} \big( \log g_S(\mathrm{pred}_S(x)) + \log g_S(\mathrm{succ}_S(x)) \big)$

This design overcomes the principal limitation of previous approaches—classical mergeable dictionary implementations, e.g., 2-4 trees, permitted merges only when inputs were disjoint in the keyspace, and would degrade to suboptimal bounds (e.g., $\Omega(n)$ ) when splits were interleaved with merges.

2. Supported Operations and Algorithmic Strategies

Predecessor-Search

Finger searches descend through the skip list levels to locate the largest element less than or equal to the query key. When initiated near a known finger, the cost is discounted by the logarithm of the distance to the finger.

Split

Finger split disconnects pointers at the split location, then traverses the affected borders to restore skip list invariants. Restoration is localized and completed in $O(\log n)$ time.

Pseudocode sketch:

Function Finger_Split(f):
    Let A = { x ∈ S : x ≤ f }
    Let B = { x ∈ S : x > f }
    Disconnect right profile of f from left profile of succ(f)
    Restore invariants in A and B via localized promotions/demotions
    Return (A, B)

Merge (Novel Operation)

The merge process proceeds through four phases:

Segment Identification: Partition sets $A$ , $B$ into alternately ordered maximal “segments” such that within each segment, elements from one set occur consecutively.
Segment Extraction: For each segment, use finger searches and splits to isolate minimum and maximum nodes.
Weight Updates: Reweight boundary nodes via finger reweight, conforming to $w(x) = g_S(k-1) + g_S(k)$ .
Segment Gluing: Sequential finger-joins recombine extracted segments into a unified skip list.

Segment partitioning and interleaved merge are the algorithmic breakthrough: by exploiting the new potential function, even merges producing many small segments can be processed with only $O(\log n)$ amortized time. This elimination of the extraneous $O(\log^2 n)$ factor marks a significant advance.

3. Performance Analysis and Complexity Guarantees

Each principal operation (predecessor-search, split, join) is achieved in $O(\log n)$ worst-case time. For the merge operation, the worst-case cost is expressed as:

$O \left( \log n + \sum_i F(A_i) + \sum_j F(B_j) \right)$

where $F(\cdot)$ denotes the cost to process each segment. Potential analysis shows that while a segment's processing may cost $\Omega(\log n)$ in the worst case, the aggregate decrease in potential ( $\Phi$ ) amortizes this cost across the entire operation, yielding $O(\log n)$ amortized for merges—even under arbitrary interleaving.

The critical insight is that the tailored potential function "charges" lower potential to many small segments, so aggregate cost does not accumulate, breaking previous barriers.

4. Comparative Analysis with Prior Structures

Tarjan and Brown: Arbitrary merges in $O(\log n)$ if splits are forbidden; splits interleaved with merges cause degeneration to $\Omega(n)$ .
Farach and Thorup: $O(\log^2 n)$ amortized bound for both merges and splits via a segment-based balanced search tree methodology.
Current Work: Concurrent split and arbitrary (interleaved) merges in $O(\log n)$ amortized time, matching known lower bounds with no additional logarithmic penalty.

The tradeoff involves increased complexity—extended finger operations and meticulous local rebalancing—but the amortized optimality and support for arbitrarily overlapping sets represent a substantial theoretical advance.

5. Theoretical Consequences

The result refutes the previous conjecture that supporting arbitrary merges and splits must necessarily incur an $O(\log^2 n)$ penalty. The lower bound, based on dynamic connectivity arguments, is established as $O(\log n)$ per operation. The methodology—especially refined potential-function analysis and weight adjustments—may transfer to other domains requiring efficient merging of interleaved ordered sets.

This data structure now forms a reference point for the optimal solution to the Mergeable Dictionary abstract data type, opening further investigation into extensions for dynamic universe management and broader applications in adaptive search structures.

6. Applications and Implementation Considerations

Potential applications include:

Dynamic search sequences in version control contexts, event streaming analysis, and interactive data analysis where frequent re-partitioning and merging of data occur.
String processing and text compression (notably in Lempel–Ziv compressed text search), where mergeable dictionaries play a pivotal role.
Mergeable trees and union-split-find problems, wherein the capacity to merge arbitrarily interleaved subtrees is essential.
Database indexing schemes requiring frequent merges of dynamic, small sorted token sets.

While the asymptotic bounds are optimal, maintaining detailed gap-based weights and performing localized finger operations may introduce constant-factor overheads. Nevertheless, in scenarios demanding worst-case performance and substantial interleaving, the structure provides unique advantages.

In sum, mergeable dictionaries as introduced in (Iacono et al., 2010) combine extended biased skip lists, gap-based weighting schemes, and advanced potential-function analytics to enable simultaneous predecessor-search, split, and interleaved merge in $O(\log n)$ amortized time. This constitutes a practically and theoretically significant advance beyond previous approaches and sets the stage for further developments in dynamic set data structures.

PDF Markdown Chat (Pro)

References (1)

Mergeable Dictionaries (2010)

Follow Topic

Get notified by email when new papers are published related to Mergeable Dictionaries.