I/O Transit Caching for PMem-based Block Device (2403.06120v1)
Abstract: Byte-addressable non-volatile memory (NVM) sitting on the memory bus is employed to make persistent memory (PMem) in general-purpose computing systems and embedded systems for data storage. Researchers develop software drivers such as the block translation table (BTT) to build block devices on PMem, so programmers can keep using mature and reliable conventional storage stack while expecting high performance by exploiting fast PMem. However, our quantitative study shows that BTT underutilizes PMem and yields inferior performance, due to the absence of the imperative in-device cache. We add a conventional I/O staging cache made of DRAM space to BTT. As DRAM and PMem have comparable access latency, I/O staging cache is likely to be fully filled over time. Continual cache evictions and fsyncs thus cause on-demand flushes with severe stalls, such that the I/O staging cache is concretely unappealing for PMem-based block devices. We accordingly propose an algorithm named Caiti with novel I/O transit caching. Caiti eagerly evicts buffered data to PMem through CPU's multi-cores. It also conditionally bypasses a full cache and directly writes data into PMem to further alleviate I/O stalls. Experiments confirm that Caiti significantly boosts the performance with BTT by up to 3.6x, without loss of block-level write atomicity.
- Prolonging PCM lifetime through energy-efficient, segment-aware, and wear-resistant page allocation, in: Proceedings of the 2014 International Symposium on Low Power Electronics and Design, Association for Computing Machinery, New York, NY, USA. p. 327–330. URL: https://doi.org/10.1145/2627369.2627667, doi:10.1145/2627369.2627667.
- Efficient intermittent computing with differential checkpointing, in: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, Association for Computing Machinery, New York, NY, USA. p. 70–81. URL: https://doi.org/10.1145/3316482.3326357, doi:10.1145/3316482.3326357.
- Intel kills Optane memory business, pays $559 million inventory write-off. https://www.tomshardware.com/news/intel-kills-optane-memory-business-for-good. [Online; accessed 02-June-2023].
- Fio - flexible I/O tester. https://github.com/axboe/fio. [Online; accessed 02-June-2023].
- Viper: An efficient hybrid PMem-DRAM key-value store. Proc. VLDB Endow. 14, 1544–1556. URL: https://doi.org/10.14778/3461535.3461543, doi:10.14778/3461535.3461543.
- BTRFS, 2022. Hardware considerations. https://btrfs.readthedocs.io/en/latest/Hardware.html#when-things-go-wrong. [Online; accessed 02-June-2023].
- Providing safe, user space access to fast, solid state disks, in: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, New York, NY, USA. p. 387–400. URL: https://doi.org/10.1145/2150976.2151017, doi:10.1145/2150976.2151017.
- OPTR: Order-Preserving translation and recovery design for SSDs with a standard block device interface, in: 2019 USENIX Annual Technical Conference (USENIX ATC 19), USENIX Association. pp. 1009–1024.
- Age-based PCM wear leveling with nearly zero search cost, in: Proceedings of the 49th Annual Design Automation Conference (DAC ’12), ACM. pp. 453–458. doi:10.1145/2228360.2228439.
- A protected block device for persistent memory, in: 2014 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–12. doi:10.1109/MSST.2014.6855541.
- UMFS: An efficient user-space file system for non-volatile memory. Journal of Systems Architecture 89, 18–29. URL: https://www.sciencedirect.com/science/article/pii/S1383762117305064, doi:https://doi.org/10.1016/j.sysarc.2018.04.004.
- HiNFS: A persistent memory file system with both buffering and direct-access. ACM Trans. Storage 14. URL: https://doi.org/10.1145/3204454, doi:10.1145/3204454.
- Benchmarking cloud serving systems with YCSB, in: Proceedings of the 1st ACM Symposium on Cloud Computing, Association for Computing Machinery, New York, NY, USA. p. 143–154. URL: https://doi.org/10.1145/1807128.1807152, doi:10.1145/1807128.1807152.
- Atomic I/O operations. https://lwn.net/Articles/552095/. [Online; accessed 02-June-2023].
- System software for persistent memory, in: Proceedings of the Ninth European Conference on Computer Systems, ACM. pp. 1–15. doi:10.1145/2592798.2592814.
- Everspin Technologies, . Spin-transfer torque MRAM technology. https://www.everspin.com/spin-transfer-torque-mram-technology. [Online; accessed 02-June-2023].
- Filebench, 2020. Filebench: File system and storage benchmark that uses a custom language to generate a large variety of workloads. https://github.com/filebench/filebench. [Online; accessed 02-June-2023].
- Software wear management for persistent memories, in: 17th USENIX Conference on File and Storage Technologies (FAST 19), USENIX Association, Boston, MA. pp. 45–63. URL: https://www.usenix.org/conference/fast19/presentation/gogte.
- Google, 2024. LevelDB. https://github.com/google/leveldb; [Online; accessed 16-February-2023].
- DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings, in: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, New York, NY, USA. p. 229–240. URL: https://doi.org/10.1145/1508244.1508271, doi:10.1145/1508244.1508271.
- Software-managed read and write wear-leveling for non-volatile main memory. ACM Trans. Embed. Comput. Syst. 21. URL: https://doi.org/10.1145/3483839, doi:10.1145/3483839.
- Understand and deploy persistent memory. https://learn.microsoft.com/en-us/azure-stack/hci/concepts/deploy-persistent-memory. [Online; accessed 02-June-2023].
- Software enabled wear-leveling for hybrid PCM main memory on embedded systems, in: 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 599–602. doi:10.7873/DATE.2013.131.
- PMSort: An adaptive sorting engine for persistent memory. J. Syst. Archit. 120. URL: https://doi.org/10.1016/j.sysarc.2021.102279, doi:10.1016/j.sysarc.2021.102279.
- Security RBSG: Protecting phase change memory with security-level adjustable dynamic mapping, in: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1081–1090. doi:10.1109/IPDPS.2016.22.
- Quail: Using NVM write monitor to enable transparent wear-leveling. Journal of Systems Architecture 102, 101658. URL: https://www.sciencedirect.com/science/article/pii/S1383762119304655, doi:https://doi.org/10.1016/j.sysarc.2019.101658.
- Joint management of RAM and flash memory with access pattern considerations, in: Proceedings of the 49th Annual Design Automation Conference, Association for Computing Machinery, New York, NY, USA. p. 882–887. URL: https://doi.org/10.1145/2228360.2228518, doi:10.1145/2228360.2228518.
- Intel, 2024. Intel optane memory - responsive memory, accelerated performance. https://www.intel.com/content/www/us/en/products/details/memory-storage/optane-memory.html. [Online; accessed 16-February-2024].
- Intel Coporation, . Scaling MySQL in the cloud with Intel Optane persistent memory. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/scaling-mysql-in-the-cloud-with-optane-persistent-memory-paper.pdf. [Online; accessed 02-June-2023].
- Intel Corpration, a. ipmctl-create-namespace - creates a namespace from a persistent memory region. https://github.com/intel/ipmctl/blob/master/Documentation/ipmctl/Persistent_Memory_Provisioning/ipmctl-create-namespace.txt. [Online; accessed 15-Dec-2023].
- Intel Corpration, b. ndctl-create-namespace - provision or reconfigure a namespace. https://github.com/pmem/ndctl/blob/main/Documentation/ndctl/ndctl-create-namespace.txt. [Online; accessed 15-Dec-2023].
- Intel Corpration, c. Speeding up I/O workloads with intel Optane persistent memory modules. https://www.intel.com/content/www/us/en/developer/articles/technical/speeding-up-io-workloads-with-intel-optane-dc-persistent-memory-modules.html. [Online; accessed 14-Dec-2023].
- SplitFS: Reducing software overhead in file systems for persistent memory, in: Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19), ACM. p. 494–508. doi:10.1145/3341301.3359631.
- Durable write cache in flash memory SSD for relational and NoSQL databases, in: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14), ACM. p. 529–540. doi:10.1145/2588555.2595632.
- X-FTL: Transactional FTL for SQLite databases, in: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13), ACM. p. 97–108. doi:10.1145/2463676.2465326.
- Designing a true Direct-Access file system with DevFS, in: 16th USENIX Conference on File and Storage Technologies (FAST 18), USENIX Association. pp. 241–256.
- Zero-copying I/O stack for low-latency SSDs. IEEE Computer Architecture Letters 20, 50–53. doi:10.1109/LCA.2021.3064876.
- KVM, 2019. Kernel virtual machine. https://www.linux-kvm.org/page/Main_Page. [Online; accessed 02-June-2023].
- Endurance enhancement of multi-level cell phase change memory, in: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. doi:10.1109/ICCAD45719.2019.8942175.
- Asynchronous I/O stack: A low-latency kernel I/O stack for ultra-low latency SSDs, in: Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, USENIX Association, USA. p. 603–616.
- LODA: A host/device co-design for strong predictability contract on modern flash storage, in: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, Association for Computing Machinery, New York, NY, USA. p. 263–279. URL: https://doi.org/10.1145/3477132.3483573, doi:10.1145/3477132.3483573.
- A wear leveling aware memory allocator for both stack and heap management in PCM-based main memory systems, in: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 228–233. doi:10.23919/DATE.2019.8715132.
- A multi-hashing index for hybrid DRAM-NVM memory systems. Journal of Systems Architecture 128, 102547. URL: https://www.sciencedirect.com/science/article/pii/S1383762122001047, doi:https://doi.org/10.1016/j.sysarc.2022.102547.
- Write dependency disentanglement with HORAE, in: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), USENIX Association. pp. 549–565.
- Linux manual page, . fsync, fdatasync - synchronize a file’s in-core state with storage device. https://man7.org/linux/man-pages/man2/fsync.2.html; [Online; accessed 16-February-2024].
- Application-specific wear leveling for extending lifetime of phase change memory in embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, 1450–1462. doi:10.1109/TCAD.2014.2341922.
- Side-channel attacks on Optane persistent memory, in: 32nd USENIX Security Symposium (USENIX Security 23), USENIX Association, Anaheim, CA. pp. 6807–6824. URL: https://www.usenix.org/conference/usenixsecurity23/presentation/liu-sihang.
- Dash: Scalable hashing on persistent memory. Proc. VLDB Endow. 13, 1147–1161. URL: https://doi.org/10.14778/3389133.3389134, doi:10.14778/3389133.3389134.
- A survey of address translation technologies for flash memories. ACM Comput. Surv. 46. URL: https://doi.org/10.1145/2512961, doi:10.1145/2512961.
- Understanding persistent memory (pmem) in vSphere. https://core.vmware.com/blog/understanding-persistent-memory-pmem-vsphere. [Online; accessed 02-June-2023].
- iJournaling: Fine-Grained journaling for improving the latency of fsync system call, in: 2017 USENIX Annual Technical Conference (USENIX ATC 17), USENIX Association, Santa Clara, CA. pp. 787–798. URL: https://www.usenix.org/conference/atc17/technical-sessions/presentation/park.
- Better atomic writes by exposing the flash out-of-band area to file systems, in: Proceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, Association for Computing Machinery, New York, NY, USA. pp. 12–23. URL: https://doi.org/10.1145/3461648.3463843, doi:10.1145/3461648.3463843.
- Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling, in: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 14–23. doi:10.1145/1669112.1669117.
- Samsung Semiconductor, 2022. Samsung electronics unveils far-reaching, next-generation memory solutions at flash memory summit 2022. https://news.samsung.com/global/samsung-electronics-unveils-far-reaching-next-generation-memory-solutions-at-flash-memory-summit-2022. [Online; accessed 02-June-2023].
- Designing a cost-effective cache replacement policy using machine learning, in: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 291–303. doi:10.1109/HPCA51647.2021.00033.
- DIDACache: An integration of device and application for flash-based key-value caching. ACM Trans. Storage 14. URL: https://doi.org/10.1145/3203410, doi:10.1145/3203410.
- An adaptive partitioning scheme for DRAM-based cache in solid state drives, in: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–12. doi:10.1109/MSST.2010.5496995.
- Thermal- and cache-aware resource management based on ML-driven cache contention prediction, in: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1384–1388. doi:10.23919/DATE54114.2022.9774776.
- Operating System Concepts, 10th Edition. Wiley. URL: http://os-book.com/OS10/index.html.
- NCache: A machine-learning cache management scheme for computational SSDs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42, 1810–1823. doi:10.1109/TCAD.2022.3208769.
- Co-active: A workload-aware collaborative cache management scheme for NVMe SSDs. IEEE Transactions on Parallel and Distributed Systems 32, 1437–1451. doi:10.1109/TPDS.2021.3052028.
- DAC: A dynamic active and collaborative cache management scheme for solid state disks. Journal of Systems Architecture 140, 102896. URL: https://www.sciencedirect.com/science/article/pii/S1383762123000759, doi:https://doi.org/10.1016/j.sysarc.2023.102896.
- Cache eviction for SSD-HDD hybrid storage based on sequential packing. Journal of Systems Architecture 141, 102930. URL: https://www.sciencedirect.com/science/article/pii/S1383762123001091, doi:https://doi.org/10.1016/j.sysarc.2023.102930.
- The kernel development community, . Explicit volatile write back cache control. https://docs.kernel.org/block/writeback_cache_control.html. [Online; accessed 02-June-2023].
- The kernel development community, 2022. BTT - block translation table. https://www.kernel.org/doc/html/latest/driver-api/nvdimm/btt.html.[Online;accessed02-June-2023].
- The kernel development community, 2023a. Ext4 general information. https://docs.kernel.org/admin-guide/ext4.html. [Online; accessed 17-Dec-2023].
- The kernel development community, 2023b. Multi-queue block IO queueing mechanism (blk-mq). https://www.kernel.org/doc/html/latest/block/blk-mq.html#multi-queue-block-io-queueing-mechanism-blk-mq. [Online; accessed 02-June-2023].
- The QEMU Project Developers, . QEMU user documentation — QEMU documentation. https://www.qemu.org/docs/master/system/qemu-manpage.html. [Online; accessed 02-June-2023].
- Using the block translation table for sector atomicity. https://pmem.io/blog/2014/09/using-the-block-translation-table-for-sector-atomicity/. [Online; accessed 02-June-2023].
- VMware Inc., . Intel optane DC persistent memory “memory mode” virtualized performance study. https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/IntelOptaneDC-PMEM-memory-mode-perf.pdf. [Online; accessed 09-Dec-2023].
- TreeFTL: An efficient workload-adaptive algorithm for RAM buffer management of NAND flash-based devices. IEEE Transactions on Computers 65, 2618–2630. doi:10.1109/TC.2015.2485221.
- A server bypass architecture for hopscotch hashing key–value store on DRAM-NVM memories. Journal of Systems Architecture 134, 102777. URL: https://www.sciencedirect.com/science/article/pii/S1383762122002624, doi:https://doi.org/10.1016/j.sysarc.2022.102777.
- FTL22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: A hybrid flash translation layer with logging for write reduction in flash memory, in: Proceedings of the 14th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems, Association for Computing Machinery, New York, NY, USA. p. 91–100. URL: https://doi.org/10.1145/2491899.2465563, doi:10.1145/2491899.2465563.
- An adaptive demand-based caching mechanism for NAND flash memory storage systems. ACM Trans. Des. Autom. Electron. Syst. 22. URL: https://doi.org/10.1145/2947658, doi:10.1145/2947658.
- NVLeak: Off-Chip Side-Channel attacks via Non-Volatile memory systems, in: 32nd USENIX Security Symposium (USENIX Security 23), USENIX Association, Anaheim, CA. pp. 6771–6788. URL: https://www.usenix.org/conference/usenixsecurity23/presentation/wang-zixuan.
- Barrier-Enabled IO stack for flash storage, in: 16th USENIX Conference on File and Storage Technologies (FAST 18), USENIX Association. pp. 211–226.
- On stacking a persistent memory file system on legacy file systems, in: 21st USENIX Conference on File and Storage Technologies (FAST 23), USENIX Association, Santa Clara, CA. pp. 281–296. URL: https://www.usenix.org/conference/fast23/presentation/woo.
- Boosting user experience via foreground-aware cache management in UFS mobile devices. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 39, 3263–3275. doi:10.1109/TCAD.2020.3013078.
- AC-Key: Adaptive caching for LSM-based key-value stores, in: 2020 USENIX Annual Technical Conference (USENIX ATC 20), USENIX Association. pp. 603–615. URL: https://www.usenix.org/conference/atc20/presentation/wu-fenggang.
- NOVA: A log-structured file system for hybrid Volatile/Non-volatile main memories, in: 14th USENIX Conference on File and Storage Technologies (FAST 16), USENIX Association. pp. 323–338.
- Efficient persistent memory file systems using virtual superpages with multi-level allocator. Journal of Systems Architecture 130, 102629. URL: https://www.sciencedirect.com/science/article/pii/S1383762122001552, doi:https://doi.org/10.1016/j.sysarc.2022.102629.
- An empirical guide to the behavior and use of scalable persistent memory, in: 18th USENIX Conference on File and Storage Technologies (FAST 20), USENIX. pp. 169–182.
- Energy-aware page replacement and consistency guarantee for hybrid NVM–DRAM memory systems. Journal of Systems Architecture 89, 60–72. URL: https://www.sciencedirect.com/science/article/pii/S1383762118300596, doi:https://doi.org/10.1016/j.sysarc.2018.07.004.
- XRP: In-Kernel storage functions with eBPF, in: 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), USENIX Association, Carlsbad, CA. pp. 375–393. URL: https://www.usenix.org/conference/osdi22/presentation/zhong.
- A write-optimal and concurrent persistent dynamic hashing with radix tree assistance. Journal of Systems Architecture 125, 102462. URL: https://www.sciencedirect.com/science/article/pii/S1383762122000522, doi:https://doi.org/10.1016/j.sysarc.2022.102462.