Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

An Empirical Guide to the Behavior and Use of Scalable Persistent Memory (1908.03583v1)

Published 9 Aug 2019 in cs.DC and cs.PF

Abstract: After nearly a decade of anticipation, scalable nonvolatile memory DIMMs are finally commercially available with the release of Intel's 3D XPoint DIMM. This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, while also providing data storage that survives power outages. Researchers have not idly waited for real nonvolatile DIMMs (NVDIMMs) to arrive. Over the past decade, they have written a slew of papers proposing new programming models, file systems, libraries, and applications built to exploit the performance and flexibility that NVDIMMs promised to deliver. Those papers drew conclusions and made design decisions without detailed knowledge of how real NVDIMMs would behave or how industry would integrate them into computer architectures. Now that 3D XPoint NVDIMMs are actually here, we can provide detailed performance numbers, concrete guidance for programmers on these systems, reevaluate prior art for performance, and reoptimize persistent memory software for the real 3D XPoint DIMM. In this paper, we explore the performance properties and characteristics of Intel's new 3D XPoint DIMM at the micro and macro level. First, we investigate the basic characteristics of the device, taking special note of the particular ways in which its performance is peculiar relative to traditional DRAM or other past methods used to emulate NVM. From these observations, we recommend a set of best practices to maximize the performance of the device. With our improved understanding, we then explore the performance of prior art in application-level software for persistent memory, taking note of where their performance was influenced by our guidelines.

Citations (375)

Summary

  • The paper demonstrates that Intel’s 3D XPoint exhibits distinct read and write behaviors compared to DRAM, with read latencies 2–3× higher and mitigated write latencies via cache flushes.
  • The paper reveals significant tail latency and non-monotonic bandwidth variability driven by access patterns and concurrency, challenging traditional emulation methods.
  • The paper advises optimizing system design by minimizing small random accesses, leveraging non-temporal stores, and managing thread concurrency for improved persistent memory performance.

An Empirical Guide to the Behavior and Use of Scalable Persistent Memory

The paper "An Empirical Guide to the Behavior and Use of Scalable Persistent Memory" investigates the characteristics and performance of Intel's Optane DC Persistent Memory Module (referred to as 3D XPoint) as a scalable nonvolatile memory (NVM) technology. This research focuses on understanding the nuances of this NVM at both micro and macro levels, providing guidance on its optimal use and revisiting prior research assumptions based on emulation.

Key Findings

The paper identifies the unique performance characteristics of Intel's 3D memory, highlighting that it differs from conventional DRAM in several respects. The paper reveals that the assumptions typically made about nonvolatile DIMMs having DRAM-like but slower behavior are oversimplifications. The performance is highly dependent on factors such as access size, type, pattern, and concurrency.

  1. Latency and Bandwidth: The paper reports that 3D read latencies are approximately 2-3 times higher than DRAM, while write latencies are more closely aligned, particularly when cache flush instructions like clwb are used. Surprisingly, 3D memory exhibits higher sensitivity to access patterns in comparison to DRAM, with substantial performance discrepancies between sequential and random access patterns.
  2. Tail Latency: 3D memory's tail latency indicates rare but significant outliers, particularly under heavy, local access conditions, suggesting internal mechanisms like wear leveling might be at play.
  3. Bandwidth Variability: The paper indicates that 3D's bandwidth is non-monotonic with increasing concurrency, particularly when threads are not optimally spread across interleaved memory channels.
  4. Comparison to Emulation Methods: Empirical evidence shows that traditional emulation methods fail to capture 3D memory's real behavior accurately, resulting in potentially misleading performance assessments in prior work.

Implications for System Design and Development

The authors derive a set of best practices from their findings to guide developers in designing and tuning systems using 3D memory:

  • Minimize Random, Small Accesses: Efforts should be made to avoid access patterns smaller than 256 bytes or to ensure high spatial locality when small accesses are unavoidable.
  • Leverage Non-temporal Stores: These should be utilized for larger data transfers to bypass cache hierarchies efficiently and optimize for sequential access patterns.
  • Manage Concurrency: Limiting the number of concurrent threads per DIMM can help lower contention and maximize throughput, benefiting from precise load balancing.
  • NUMA-Avoidance: Especially for applications with frequent read-modify-write operations, efforts should be put into minimizing remote NUMA node accesses to avoid severe performance penalties.

Revisitation of Prior Research

The paper revisits prior studies that utilized older emulation techniques to provide a clearer picture with contemporary, real-hardware trials. For instance, in the case of RocksDB, a comparison of fine-grained persistence against coarser-grained logging strategies showed opposing results when tested on actual 3D XPoint modules, demonstrating the critical need for real-hardware assessment rather than reliance on emulation.

The NOVA file system assessment further exemplifies the necessity of targeted optimizations sensitive to persistent memory specifics, leading to significant performance improvements when small updates are incorporated into metadata logs rather than the main data path.

Conclusions and Future Directions

Intel's 3D memory offers a transformative step in the persistent memory landscape by providing a new tier that enables high-density, byte-addressable persistence. However, the research emphasizes the complexity inherent in these technologies and the necessity for tailored software solutions that mind their unique peculiarities. The findings highlight the need for reoptimizing existing software and reevaluating previous research results using actual hardware, given that persistent memory technologies like 3D XPoint are continuously evolving.

Future work will involve broader adoption and adaptation of software systems to fully exploit the potential of scalable nonvolatile memory, ensuring that both performance and consistency can be optimally balanced in real-world applications.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.