Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving DRAM Performance by Parallelizing Refreshes with Accesses (1712.07754v1)

Published 21 Dec 2017 in cs.AR

Abstract: Modern DRAM cells are periodically refreshed to prevent data loss due to leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades performance significantly because it prevents an entire rank from serving memory requests while being refreshed. DRAM designed for mobile platforms, LPDDR DRAM, supports an enhanced mode, called per-bank refresh, that refreshes cells at the bank level. This enables a bank to be accessed while another in the same rank is being refreshed, alleviating part of the negative performance impact of refreshes. However, there are two shortcomings of per-bank refresh. First, the per-bank refresh scheduling scheme does not exploit the full potential of overlapping refreshes with accesses across banks because it restricts the banks to be refreshed in a sequential round-robin order. Second, accesses to a bank that is being refreshed have to wait. To mitigate the negative performance impact of DRAM refresh, we propose two complementary mechanisms, DARP (Dynamic Access Refresh Parallelization) and SARP (Subarray Access Refresh Parallelization). The goal is to address the drawbacks of per-bank refresh by building more efficient techniques to parallelize refreshes and accesses within DRAM. First, instead of issuing per-bank refreshes in a round-robin order, DARP issues per-bank refreshes to idle banks in an out-of-order manner. Furthermore, DARP schedules refreshes during intervals when a batch of writes are draining to DRAM. Second, SARP exploits the existence of mostly-independent subarrays within a bank. With minor modifications to DRAM organization, it allows a bank to serve memory accesses to an idle subarray while another subarray is being refreshed. Extensive evaluations show that our mechanisms improve system performance and energy efficiency compared to state-of-the-art refresh policies and the benefit increases as DRAM density increases.

Citations (216)

Summary

  • The paper introduces DARP and SARP, two innovative techniques that enable concurrent DRAM refreshes and accesses to alleviate performance bottlenecks.
  • It leverages dynamic scheduling and subarray partitioning to overlap refresh operations with memory accesses, reducing refresh-induced latency.
  • Comprehensive evaluations demonstrate up to 15.2% performance improvements in high-density DRAM systems, highlighting enhanced throughput and energy efficiency.

An Examination of DRAM Performance Optimization through Parallelizing Refreshes and Accesses

The paper "Improving DRAM Performance by Parallelizing Refreshes with Accesses" presents a detailed exploration and proposal of advanced techniques aimed at alleviating the performance degradation associated with DRAM refresh operations. This research addresses an existing performance bottleneck caused by the necessity of periodically refreshing DRAM cells to avoid data loss due to charge leakage. Traditional approaches, particularly in DDR and LPDDR DRAM systems, refresh cells at the rank level, which obstructs memory requests from being processed during refresh intervals. The proposed methodologies in this work provide a nuanced approach to DRAM operation, implementing two novel mechanisms: Dynamic Access Refresh Parallelization (DARP) and Subarray Access Refresh Parallelization (SARP).

Overview of DRAM Refresh Challenges

At the core of the performance challenge is the refresh operation's interference with memory access. As DRAM density escalates, the refresh interval and corresponding performance degradation are predicted to become even more substantial. Specifically, the paper documents an average performance reduction of 8.2% and 19.9% for 8Gb and 32Gb chips, respectively.

The traditional methods for managing refresh include all-bank and per-bank refresh. The former refreshes all banks collectively, leading to complete inaccessibility of the rank, whereas the latter, employed by LPDDR DRAM, sequentially refreshes banks, allowing some parallelism. However, limitations still exist due to sequential scheduling and inability to concurrently access a bank being refreshed.

Proposed Mechanisms

Dynamic Access Refresh Parallelization (DARP): This mechanism introduces an intelligent refresh scheduling policy that tackles two primary issues. Firstly, it enables the out-of-order initiation of refreshes based on bank idleness, deviating from the typical round-robin order. This scheduling is dynamically informed by monitoring bank request queues to maximize the parallelization of refresh and access operations. Secondly, DARP efficiently overlaps refresh periods with write operations due to the latter's inherent non-latency critical nature. The integration of these two techniques results in a significant reduction of refresh-induced latency.

Subarray Access Refresh Parallelization (SARP): By leveraging the subarray structure within a DRAM bank, SARP allows for memory accesses to proceed in idle subarrays while others are undergoing refreshes. This technique necessitates modest modifications to the DRAM architecture, focusing on the partitioning of global address paths and the independent activation of subarrays. The innovation of SARP provides an alternative path to the DRAM command sequence processing, ensuring continuity of data flow even during refresh cycles.

Evaluation and Implications

Through comprehensive evaluations involving diverse workloads and configurations, the combination of DARP and SARP, termed DSARP, demonstrates superior performance outcomes compared to state-of-the-art refresh policies. The paper reports an average system performance improvement of up to 15.2% across workloads utilizing 32Gb DRAM, indicating an increasingly significant disparity as chip density rises.

The implications of this research are substantial, forecasting enhanced DRAM throughput and energy efficiency. Practically, these techniques appear well-positioned for deployment in high-density environments where refresh cycles threaten to compromise performance integrity. Theoretically, this work challenges existing paradigms of DRAM operation, encouraging further exploration into exploiting DRAM’s internal architecture for enhanced parallelism.

Future Directions

While the paper establishes a solid foundation, the prospects for subsequent exploration include refinement of DRAM architecture to further reduce area and power overheads and extensions to future memory technologies beyond DRAM. Additionally, integrating these innovations into industry standards will be crucial for wide adoption.

In conclusion, this research articulates a targeted response to DRAM refresh overhead, showcasing how tailored hardware mechanisms can tackle systemic inefficiencies, paving the way for future advancements in memory technology performance.