Tiering Merge Policies

Updated 16 September 2025

Tiering merge policies are strategies that defer merging until a tier accumulates a set threshold, reducing write amplification in systems like LSM-trees.
Analytical models quantify cost trade-offs by evaluating merge, read, and space amplification to ensure efficient performance even under heavy workloads.
Adaptive algorithms utilize methodologies such as EWMA and hardware-based profiling to dynamically manage data migration across heterogeneous memory systems.

A tiering merge policy is a strategy for consolidating multiple data or memory components across several storage or memory tiers, commonly applied in persistent storage engines (notably LSM-trees in NoSQL databases) and emerging heterogeneous memory systems. Tiering policies determine when, which, and how many components—files, runs, or memory pages—should be merged or migrated across levels to balance operational costs such as write amplification, read latency, bandwidth usage, and space overhead. These policies are critical in systems where overlapping or non-contiguous (interleaved) sets must be merged, and where static or heuristic-based approaches can incur significant inefficiencies.

1. Structural Foundations of Tiering Merge Policies

Tiering merge policies were first formalized in the context of LSM-trees and related data structures. In standard LSM-tree storage engines, two primary merge (“compaction”) strategies are observed: leveling and tiering. “Leveling” retains only one component (typically a sorted run or SSTable) per level, merging newly-flushed or compacted data immediately; “tiering,” by contrast, defers merging and allows each level (or tier) to accumulate up to a configurable maximum T components. Once T is reached, these components are merged and promoted to the next level (Luo et al., 2018).

In the memory domain, tiering merges control the migration of data pages between heterogeneous memory types (e.g., DRAM, NVM, CXL-extended memory). Newer systems increasingly rely on hardware/software co-design or adaptive heuristics to decide when to promote or demote memory pages between tiers according to access heat, bandwidth availability, and recent usage patterns (Zhou et al., 27 Mar 2024, Yadalam et al., 6 Aug 2025).

2. Analytical Models and Cost Trade-offs

A tiering merge policy is primarily evaluated in terms of operational costs: write amplification (the number of times data is rewritten before it stabilizes on disk), read amplification (extra effort required to find or scan entries), and space amplification (redundant or overlapping storage). Analytical models—such as those in “Bigtable Merge Compaction” (Mathieu et al., 2014)—use a cost function over time, combining merge costs (the total size of files or data to be merged) with read costs (which are a function of the post-merge stack depth or number of tiers/files per level).

For example, the cost per level in tiering, assuming block size B and per-level component threshold T, can be expressed as: $|C_i| \leq T^{i+1} \cdot B \cdot P,\quad i \ge 0$ with total number of levels: $L = \left\lceil \log_T\left( \frac{N}{B \cdot P} \cdot \frac{T}{T+1} \right) \right\rceil$ where N is total key count and P is entries per block.

Tiering reduces write amplification by a factor of T compared to leveling (write cost O(L/B) vs. O(T·L/B)), but increases query cost by up to T times for point lookups and range queries, since more overlapping files may need to be searched (Luo et al., 2018).

3. Algorithms and Online Policy Design

Several algorithms have been established for tiering merge policies, moving beyond ad hoc rules. Rent-or-buy recurrences are used to decide whether to merge immediately (pay upfront) or defer to later (rent), balancing merge and future read costs. Formal policies operate without foreknowledge of future workloads, guaranteeing performance bounds even in adversarial settings.

In Bigtable-like systems, the BMC₍f₎ model captures merge (file size) and read (depth-based) costs. Optimally competitive online policies (e.g., K-competitive for fixed-tier count K) are constructed using recursive phase partitioning or via a bijection to binary search trees, where left/right depths correspond to merge/read penalties (Mathieu et al., 2014).

In memory tiering, algorithms like ARMS remove static thresholds and instead use short- and long-term moving averages per page (EWMA) to compute a relative hotness score. Migrations (promotions/demotions) are delayed until the benefit exceeds the cost, as determined by recent access patterns and application bandwidth (Yadalam et al., 6 Aug 2025). NeoMem leverages real-time Count-Min Sketch hardware statistics for migration thresholding (Zhou et al., 27 Mar 2024).

Table: Key Algorithmic Features of Tiering Merge Policies

Policy/System	Merge Trigger	Hot/Cold Identification	Migration Scheduling
LSM-tiering	T components/level	N/A	Immediate on overflow
Bigtable-BMC	Recurrence/tree	N/A	Cost-balanced, phased
ARMS	Adaptive	EWMA-based, threshold-free	Bandwidth-aware batching
NeoMem	Hardware-driven	Sketch-based hot count, software threshold	Migration quota and dynamic threshold

4. Strategies for Hot/Cold Identification and Adaptive Decisions

In modern systems, hot/cold data identification underpins tiering migrations. ARMS maintains per-page short-term and long-term EWMAs (αₛ = 0.7, αₗ = 0.1), combining them into a weighted hotness score. The top-k pages (fast-tier capacity) are marked as “hot,” and migration is filtered through multi-round candidate verification and cost/benefit analysis. Only if the expected gain (proportional to the hot/cold score difference and hot-age) exceeds the combined migration latency is promotion scheduled (Yadalam et al., 6 Aug 2025).

NeoMem implements hardware-based profiling using a customized Count-Min Sketch in the CXL device’s NeoProf unit, reporting per-page access frequencies and supporting OS-side threshold adjustment algorithms. The OS dynamically computes a hotness threshold using quantile functions over observed access histograms, adapting the promoted fraction p in response to statistical error, bandwidth, and “ping-pong” migration rates (Zhou et al., 27 Mar 2024).

Cluster-scale systems extend these policies by using incremental ML models (e.g., XGBoost) to predict file “temperature” and determine promotion/demotion actions for file blocks in distributed storage (Herodotou et al., 2019).

5. Practical Implications and System Implementations

Tiering merge policies have been broadly adopted in open-source NoSQL engines (RocksDB, HBase, Cassandra), frequently as alternatives to the default leveling policy. RocksDB provides tiering with tunable parameters K and T, enabling users to trade between merge frequency and read query latency. Partitioned tiering (vertical and horizontal grouping) further generalizes the approach to better accommodate non-uniform data distribution and workload variability (Luo et al., 2018).

Memory tiering systems (e.g., ARMS, NeoMem) have been deployed on platforms with DRAM, NVM, and CXL-extended memory. NeoMem’s FPGA-based evaluation reports 32–67% geomean speedup over prior solutions, attributed to its accurate, low-overhead hardware profiling and dynamic software thresholds. ARMS demonstrates performance within 3% of manually-tuned baseline systems and up to 2.3× better than untuned ones, reducing wasteful migrations and improving bandwidth utilization (Yadalam et al., 6 Aug 2025, Zhou et al., 27 Mar 2024).

Distributed environments for Hadoop/Spark use ML-based frameworks for tiering, reporting decreased job completion times (18–27% for large jobs) and improved cluster efficiency (up to 41%) when replacing static caching policies with adaptive, access-prediction-driven migration (Herodotou et al., 2019).

6. Theoretical Insights and Future Directions

The mathematical treatment of tiering merge policies—covering amortized analysis, competitive ratio bounds, and explicit recurrences—demonstrates that provably efficient online policies outperform trial-and-error tuning. Open challenges include minimizing variance in performance (e.g., merge spikes), developing hybrid schemes such as “lazy leveling” (tiering at low levels, leveling at the largest level), and advancing auto-tuning techniques that jointly optimize merge thresholds, data structure parameters, and memory/bandwidth allocations (Luo et al., 2018).

A plausible implication is that future systems will further integrate hardware and software adaptation, employing real-time feedback and ML-driven or sketch-based profiling at scale, while abstracting policy configuration away from operators.

7. Comparative Analysis and Distinctions

Tiering merge policies stand apart by deferring merges to bundle more updates into fewer, larger compactions, in contrast to immediate or incremental merging. This approach yields substantial reductions in write/merge amplification and improves ingestion throughput. The primary disadvantage is an increase in query path length and possible space amplification, as more overlapping components must be searched or stored prior to each merge (Luo et al., 2018). Memory tiering systems face analogous trade-offs: aggressive migration improves access to hot data but may incur wasted bandwidth and unnecessary movement unless sophisticated identification and cost/benefit logic are applied (Yadalam et al., 6 Aug 2025).

Empirical results from recent systems confirm that robust, adaptive tiering policies—whether using EWMA hotness, sketch-based hardware profiling, or ML prediction—substantially outperform static, threshold-based or purely heuristic alternatives in a variety of workloads and hardware configurations. This underpins their central role in state-of-the-art data management engines and hardware-aware memory managers.

Overall, tiering merge policies encode a family of strategies that optimize the timing and scope of merges or migrations across storage or memory hierarchies, supported by a spectrum of algorithmic, analytical, and practical advances in both software and integrated hardware/software systems.

PDF Markdown Chat (Pro)

References (5)

LSM-based Storage Techniques: A Survey (2018)

NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering (2024)

ARMS: Adaptive and Robust Memory Tiering System (2025)

Bigtable Merge Compaction (2014)

Automating Distributed Tiered Storage Management in Cluster Computing (2019)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Tiering Merge Policies.