- The paper demonstrates that Intel’s 3D XPoint exhibits distinct read and write behaviors compared to DRAM, with read latencies 2–3× higher and mitigated write latencies via cache flushes.
- The paper reveals significant tail latency and non-monotonic bandwidth variability driven by access patterns and concurrency, challenging traditional emulation methods.
- The paper advises optimizing system design by minimizing small random accesses, leveraging non-temporal stores, and managing thread concurrency for improved persistent memory performance.
An Empirical Guide to the Behavior and Use of Scalable Persistent Memory
The paper "An Empirical Guide to the Behavior and Use of Scalable Persistent Memory" investigates the characteristics and performance of Intel's Optane DC Persistent Memory Module (referred to as 3D XPoint) as a scalable nonvolatile memory (NVM) technology. This research focuses on understanding the nuances of this NVM at both micro and macro levels, providing guidance on its optimal use and revisiting prior research assumptions based on emulation.
Key Findings
The paper identifies the unique performance characteristics of Intel's 3D memory, highlighting that it differs from conventional DRAM in several respects. The paper reveals that the assumptions typically made about nonvolatile DIMMs having DRAM-like but slower behavior are oversimplifications. The performance is highly dependent on factors such as access size, type, pattern, and concurrency.
- Latency and Bandwidth: The paper reports that 3D read latencies are approximately 2-3 times higher than DRAM, while write latencies are more closely aligned, particularly when cache flush instructions like
clwb are used. Surprisingly, 3D memory exhibits higher sensitivity to access patterns in comparison to DRAM, with substantial performance discrepancies between sequential and random access patterns.
- Tail Latency: 3D memory's tail latency indicates rare but significant outliers, particularly under heavy, local access conditions, suggesting internal mechanisms like wear leveling might be at play.
- Bandwidth Variability: The paper indicates that 3D's bandwidth is non-monotonic with increasing concurrency, particularly when threads are not optimally spread across interleaved memory channels.
- Comparison to Emulation Methods: Empirical evidence shows that traditional emulation methods fail to capture 3D memory's real behavior accurately, resulting in potentially misleading performance assessments in prior work.
Implications for System Design and Development
The authors derive a set of best practices from their findings to guide developers in designing and tuning systems using 3D memory:
- Minimize Random, Small Accesses: Efforts should be made to avoid access patterns smaller than 256 bytes or to ensure high spatial locality when small accesses are unavoidable.
- Leverage Non-temporal Stores: These should be utilized for larger data transfers to bypass cache hierarchies efficiently and optimize for sequential access patterns.
- Manage Concurrency: Limiting the number of concurrent threads per DIMM can help lower contention and maximize throughput, benefiting from precise load balancing.
- NUMA-Avoidance: Especially for applications with frequent read-modify-write operations, efforts should be put into minimizing remote NUMA node accesses to avoid severe performance penalties.
Revisitation of Prior Research
The paper revisits prior studies that utilized older emulation techniques to provide a clearer picture with contemporary, real-hardware trials. For instance, in the case of RocksDB, a comparison of fine-grained persistence against coarser-grained logging strategies showed opposing results when tested on actual 3D XPoint modules, demonstrating the critical need for real-hardware assessment rather than reliance on emulation.
The NOVA file system assessment further exemplifies the necessity of targeted optimizations sensitive to persistent memory specifics, leading to significant performance improvements when small updates are incorporated into metadata logs rather than the main data path.
Conclusions and Future Directions
Intel's 3D memory offers a transformative step in the persistent memory landscape by providing a new tier that enables high-density, byte-addressable persistence. However, the research emphasizes the complexity inherent in these technologies and the necessity for tailored software solutions that mind their unique peculiarities. The findings highlight the need for reoptimizing existing software and reevaluating previous research results using actual hardware, given that persistent memory technologies like 3D XPoint are continuously evolving.
Future work will involve broader adoption and adaptation of software systems to fully exploit the potential of scalable nonvolatile memory, ensuring that both performance and consistency can be optimally balanced in real-world applications.