Dynamic Data Placement in HH-PIM

Updated 20 October 2025

Dynamic Data Placement Optimization (HH-PIM) is a methodology for optimizing data location in hierarchical, hybrid memory-processing systems through real-time, adaptive policies.
It employs algorithmic strategies such as dynamic programming and reinforcement learning to manage data replication, migration, and load balancing across diverse memory types.
Practical deployments demonstrate up to 60% energy savings and 54% latency reduction, highlighting its impact on energy efficiency and overall system performance.

Dynamic Data Placement Optimization (HH-PIM) refers to the suite of algorithms, architectural strategies, and system-level mechanisms developed to optimize where and when data is placed within hierarchical and hybrid memory-processing systems, particularly in the context of advanced Processing-in-Memory (PIM) architectures. The essential aim of dynamic data placement is to minimize overall access latency, bandwidth usage, and energy consumption by actively deciding—often in real time or per workload phase—on how to replicate, partition, or migrate data objects across heterogeneous memory and compute modules. In HH-PIM systems, these choices are non-trivially influenced by network topology, memory hierarchy (SRAM, DRAM, MRAM, ReRAM), diverse compute resources, workload locality, and evolving system constraints.

1. Architectures and System Models

In HH-PIM, system architectures typically incorporate heterogeneous PIM modules, each specializing toward either high-performance (e.g., high-voltage SRAM/MRAM clusters) or energy-efficient operation (e.g., low-voltage MRAM clusters) (Jeon et al., 2 Apr 2025). Controllers orchestrate the execution of the FETCH–DECODE–LOAD–EXECUTE–STORE pipeline, leveraging data movement between memory layers. The architecture may involve hybrid devices (SRAM+MRAM, DRAM+ReRAM), 3D-stacked memories (HMC, HBM) with in-layer or near-layer compute units, and multiple processing tiles interconnected via arbitrary or non-metric topologies (Angel et al., 2010, Tian et al., 9 Oct 2025).

A critical architectural feature is the flexibility to assign and reassign both weights and intermediate data dynamically across these resources while regulating the number of replicas, physical address mapping, and data residency in ephemeral versus non-volatile banks. The system often comprises dual-controller schemes for fine-grained adaptation to workload bursts and idle phases, with mechanisms for rapid weight transfers between HP- and LP-PIM clusters.

2. Algorithmic Frameworks for Data Placement

Optimal algorithms for data placement (DP) and page placement (PP) problems in these systems generate configurations—nonempty subsets of processing/memory modules—where each data object can be replicated (Angel et al., 2010). For networks with a constant number of clients (or PIM modules), the DP and PP problems are captured via dynamic programming recurrences:

For uniform-length objects, the recurrence

$f_k(r) = \min_{\substack{c: r - l_k d_c \geq 0}} \Big\{ cost(o, c) + f_{k-1}(r - l_k d_c) \Big\}$

is used, with $r$ the available cache vector and $d_c$ the indicator vector for placing object $o$ on configuration $c$ .

For non-uniform lengths, the problem is approximated with capacity slack $\epsilon l_{max}$ by rounding, and then solved as above.

PP introduces additional load vectors tracking the allowed number of clients served per module, and a history pattern accounting for prior connections. These dynamic programming strategies efficiently incorporate cache size, load balancing, installation and access costs, as well as support for page size heterogeneity.

In systems modeled as distributed clusters or data centers, the data placement problem can also be reduced to weighted graph partitioning (with nodes representing relations/objects and queries, and edge-weights encoding communication costs), or mapped onto combinatorial optimization formulations such as mixed knapsack or quadratic assignment problems (Ibrahim et al., 29 Mar 2024, Golab et al., 2013, Jeon et al., 2 Apr 2025).

3. Dynamic and Adaptive Data Migration Mechanisms

Recent directions emphasize not just initial placement but also on-demand migration and dynamic reallocation in response to workload skew, phase changes, or detected bottlenecks:

DL-PIM (Tian et al., 9 Oct 2025) dynamically migrates "subscribed" data blocks from remote to local vaults (channels/banks) when latency penalties (network hop count, queue delays) exceed thresholds. A distributed subscription table provides indirection and ensures updated address resolution, while an adaptive policy toggles migration based on measured performance deltas.
Multi-agent reinforcement learning (RL) fabric (Nadig et al., 26 Mar 2025) divides data management into a placement policy (for incoming data) and a migration policy (for in-place pages), with each RL agent trained online for optimal device selection and page movement based on reward structures (reward inversely proportional to latency, penalized for excessive migrations). This allows coordinated and adaptive migration policies that self-tune for hybrid memory/storage environments.

Adaptive mechanisms employ local and global metrics (average request latency, per-vault utilization, hop-based savings, coefficient of variation of memory accesses) to decide whether to activate migration, replicate data, or cancel previous migrations.

4. Cost Models and Formulations

Cost functions in HH-PIM placement frameworks encapsulate both local and global objectives. At the module/object level, cost terms typically include:

Access cost: a function of data path length (network hop count, DRAM row accesses), latency, and bandwidth consumption, e.g., $cost_{access} = d_{ij} \cdot w_{jo} \cdot l_o$ with $d_{ij}$ the connectivity distance.
Installation cost: overhead to place or replicate a data object (e.g., energy or time to program weights into a new bank).
Replication/load constraint penalties: imposed by hard or soft limits on how many modules can serve as replicas and the maximum number of clients they may serve.
Capacity violation slack: in non-uniform object settings, an optimal solution may allow up to $\epsilon l_{max}$ overflow per cache.

Objective functions are often formalized for both placement and migration, e.g.,

$\min_{\{x_i\}} E_{task} = \sum_{i=1}^{n} e_i x_i$

subject to

$\sum_{i=1}^{n} t_i x_i \leq t_{constraint},\quad \sum_{i=1}^{n} x_i = k,\quad x_i \in \mathbb{Z}_+.$

Where $x_i$ encodes the quantity of data assigned to storage type $i$ , $e_i$ is per-access energy, $t_i$ is access time, $t_{constraint}$ is the application-imposed latency bound, and $k$ is total data to place (Jeon et al., 2 Apr 2025). Hierarchical dynamic programming, knapsack solvers, and integer programs are the prevalent solution strategies.

Table: Representative Cost Terms and Constraints

Cost/Constraint	Formalization	Context
Access + Install	$\sum_j (1-p_{cj})w_{jo}l_od_j(c) + p_{cj} f_j$	DP: per-client, per-object
Time/Energy tradeoff	$E_{task} = \sum e_i x_i$ , $\sum t_i x_i \leq t_{constraint}$	Energy-latency balancing
Replication slack	$x_i \leq k_{max}$ , $\sum x_i = k$	Replication/load policies

5. Data Placement in Hierarchical Memory and Compute Systems

HH-PIM leverages both vertical (hierarchical) and horizontal (distributed) placement strategies:

In hierarchical configurations, data (e.g., DNN weights) are dynamically assigned to faster SRAM or slower, but energy-efficient MRAM, changing as a function of workload intensity or latency constraints (Jeon et al., 2 Apr 2025).
In distributed clusters, data objects (e.g., database tables, hash buckets) are partitioned and mapped to memory banks or nodes via dynamic programming or graph-based algorithms that optimize load balancing, query access costs, and minimize communication (Golab et al., 2013, Shekar et al., 2023).
In 3D-stacked PIM, dynamic block subscription enables popular or frequently accessed data to be migrated closer to active compute units, with adaptive control based on measured reuse, balancing the costs of additional indirection (Tian et al., 9 Oct 2025).

Dynamic page and region reallocation occur through fast bookkeeping mechanisms (onpage structures, distributed indirection tables) rather than centralized memory management, allowing fast load spikes and consistent low latency.

6. Performance Metrics and System Impact

HH-PIM dynamic data placement mechanisms yield substantial gains in system-level metrics:

Energy efficiency is improved by up to 60.43% over conventional designs when dynamic allocation between HP- and LP-PIM modules is performed (Jeon et al., 2 Apr 2025).
Average memory latency per request is reduced by 54% in HMC and 50% in HBM 3D-stacked memories; representative workloads exhibit aggregate performance speedups of 6% (HMC) and 3% (HBM) (Tian et al., 9 Oct 2025).
In hybrid storage/memory environments, multi-agent RL-driven migration yields up to 49.5% performance gains (IOPS, throughput) over earlier state-of-the-art policies, with only 240 ns inference latency and 206 KiB DRAM overhead for the control agents (Nadig et al., 26 Mar 2025).
Small, controlled storage violations (by at most $\epsilon l_{max}$ ) are permitted in certain non-uniform object cases, with the violation proved to be asymptotically tight (Angel et al., 2010).

7. Applicability, Limitations, and Future Directions

The principles of dynamic data placement optimization in HH-PIM are highly general—applicable to a variety of settings, including edge AI (battery-constrained TinyML), large-scale DNN inference, distributed object stores, and storage systems needing real-time adaptation. Strengths include topology-agnostic solutions, formal bounds for optimality and slack, and adaptability to time-varying or bursty load patterns.

Limitations arise in environments with highly skewed reuse (where migration/adaptive policies can introduce excessive traffic), in application scenarios not well characterized by the modeled cost functions, or when network/DRAM topology is highly irregular and indirection latency becomes non-negligible. Adaptive policies are thus essential to prevent performance drops by dynamically toggling between migration-ready and migration-disabled modes.

Open topics include extending current framework to hybrid storage-class memory, optimizing for multiple simultaneous objectives (latency, energy, write endurance), and integrating placement/migration with application-level scheduling for workload-specific optimization.

Conclusion

Dynamic Data Placement Optimization in HH-PIM systems encompasses a collection of algorithmic, architectural, and system-level methods for joint optimization of data locality, resource utilization, and energy efficiency. By leveraging dynamic programming, combinatorial optimization, adaptive and reinforcement-learning-based mechanisms, and topology-agnostic design, these methods provide a rigorous foundation for minimizing data movement bottlenecks and maximizing performance across diverse, hierarchical, and energy-constrained memory-processing systems. The broad relevance and proven gains in latency, energy, and throughput establish dynamic data placement as a core design pillar in continued PIM system evolution.