Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Design and Implementation of a High-Performance Log-Structured RAID System for ZNS SSDs (2402.17963v1)

Published 28 Feb 2024 in cs.DC

Abstract: Zoned Namespace (ZNS) defines a new abstraction for host software to flexibly manage storage in flash-based SSDs as append-only zones. It also provides a Zone Append primitive to further boost the write performance of ZNS SSDs by exploiting intra-zone parallelism. However, making Zone Append effective for reliable and scalable storage, in the form of a RAID array of multiple ZNS SSDs, is non-trivial since Zone Append offloads address management to ZNS SSDs and requires hosts to dedicatedly manage RAID stripes across multiple drives. We propose ZapRAID, a high-performance log-structured RAID system for ZNS SSDs by carefully exploiting Zone Append to achieve high write parallelism and lightweight stripe management. ZapRAID adopts a group-based data layout with a coarse-grained ordering across multiple groups of stripes, such that it can use small-size metadata for stripe management on a per-group basis under Zone Append. It further adopts hybrid data management to simultaneously achieve intra-zone and inter-zone parallelism through a careful combination of both Zone Append and Zone Write primitives. We evaluate ZapRAID using microbenchmarks, trace-driven experiments, and real-application experiments. Our evaluation results show that ZapRAID achieves high write throughput and maintains high performance in normal reads, degraded reads, crash recovery, and full-drive recovery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Fio - Flexible I/O Tester Synthetic Benchmark. http://git.kernel.dk/?p=fio.git.
  2. Intel Optane 128 GiB DC Persistent Memory 100 Series for HPE. https://ark.intel.com/content/www/us/en/ark/products/190348/intel-optane-persistent-memory-128gb-module.html.
  3. MySQL. https://www.mysql.com/.
  4. NVMe command set specifications. https://nvmexpress.org/developers/nvme-command-set-specifications/.
  5. Raizn. https://github.com/ZonedStorage/RAIZN-release.
  6. RocksDB. https://github.com/facebook/rocksdb.
  7. Software-Enabled Flash. https://softwareenabledflash.org/.
  8. SPDK block device layer programming guide. https://spdk.io/doc/bdev_pg.html.
  9. SPDK: NVMe over Fabrics Target. https://spdk.io/doc/nvmf.html.
  10. Storage Performance Development Kit (SPDK). https://spdk.io/.
  11. sysbench. https://github.com/akopytov/sysbench.
  12. tpcc-mysql. https://github.com/Percona-Lab/tpcc-mysql.
  13. Western Digital Ultrastar DC ZN540. https://www.westerndigital.com/products/internal-drives/data-center-drives/ultrastar-dc-zn540-nvme-ssd.
  14. Zoned Storage Website. https://zonedstorage.io.
  15. Avoiding file system micromanagement with range writes. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08), pages 161–176, December 2008.
  16. Workload analysis of a large-scale key-value store. In Proc. of ACM SIGMETRICS, pages 53–64, London, England, UK, June 2012.
  17. What you can’t forget: Exploiting parallelism for zoned namespaces. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’22), pages 79–85, June 2022.
  18. ZNSwap: un-Block your swap. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’22), pages 1–18, July 2022.
  19. ZNS: Avoiding the block interface tax for flash-based SSDs. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’21), pages 689–703, July 2021.
  20. A free-space adaptive runtime zone-reset algorithm for enhanced ZNS efficiency. In Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’23), pages 109–115, July 2023.
  21. The TickerTAIP parallel RAID architecture. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA’93), pages 52–63, May 1993.
  22. Characterizing, modeling, and benchmarking RocksDB key-value workloads at facebook. In Proc. of USENIX FAST, pages 209–223, Santa Clara, CA, USA, February 2020.
  23. VSSD: performance isolation in a solid-state drive. ACM Transactions on Design Automation of Electronic Systems, 20(4):51:1–51:33, September 2015.
  24. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. ACM SIGMETRICS Performance Evaluation Review, 37(1):181–192, June 2009.
  25. Software orchestrated flash array. In Proceedings of International Conference on Systems and Storage (SYSTOR’14), pages 1–11, June 2014.
  26. A new LSM-style garbage collection scheme for ZNS SSDs. In Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20), pages 1–6, July 2020.
  27. Partial parity cache and data cache management method to improve the performance of an SSD-based RAID. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(7):1470–1480, August 2014.
  28. Purity: Building fast, highly-available enterprise flash storage from commodity components. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15), pages 1683–1694, May 2015.
  29. F. J. Corbato. A paging experiment with the multics system. MIT Project MAC Report, 1968.
  30. ZNS+: Advanced zoned namespace interface for supporting in-storage zone compaction. In Proceedings of the 15th USENIX Conference on Operating Systems Design and Implementation (OSDI’21), pages 147–162, July 2021.
  31. The tail at store: A revelation from millions of hours of disk and SSD deployments. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16), pages 263–276, February 2016.
  32. FlashBlox: Achieving both performance isolation and uniform lifetime for virtualized SSDs. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17), pages 375–390, February 2017.
  33. Flash-aware RAID techniques for dependable and high-performance flash memory SSD. IEEE Transactions on Computers, 60(1):80–92, January 2011.
  34. Elevating commodity storage with the SALSA host translation layer. In Proceedings of 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’18), pages 277–292, September 2018.
  35. FusionRAID: Achieving consistent low latency for commodity SSD arrays. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST’21), pages 355–370, February 2021.
  36. DFS: A file system for virtualized flash storage. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10), pages 1–15, February 2010.
  37. Lifetime-leveling LSM-tree compaction for ZNS SSD. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’22), pages 100–105, June 2022.
  38. Bryan Suk Kim. Utilitarian performance isolation in shared SSDs. In Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’18), pages 1–6, July 2018.
  39. BPLRU: A buffer management scheme for improving random writes in flash storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08), pages 1–14, February 2008.
  40. Improving SSD reliability with RAID via elastic striping and anywhere parity. In Proceedings of the 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’13), pages 1–12, June 2013.
  41. Alleviating garbage collection interference through spatial separation in all flash arrays. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’19), pages 799–812, July 2019.
  42. RAIZN: Redundant array of independent zoned namespaces. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’23), Volume 2, pages 660–673, January 2023.
  43. Coordinating garbage collection for arrays of solid-state drives. IEEE Transactions on Computers, 63(4):888–901, 2012.
  44. Harmonia: A globally coordinated garbage collector for arrays of solid-state drives. In Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST’11), pages 1–12, May 2011.
  45. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15), pages 273–286, February 2015.
  46. An efficient order-preserving recovery for F2FS with ZNS SSD. In Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’23), pages 116–122, July 2023.
  47. Compaction-aware zone allocation for LSM based key-value store on ZNS SSDs. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’22), pages 93–99, June 2022.
  48. WALTZ: Leveraging zone append to tighten the tail latency of LSM tree on ZNS SSD. In Proceedings of the VLDB Endowment, pages 2884–2896, August 2023.
  49. The CASE of FEMU: Cheap, accurate, scalable and extensible flash emulator. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18), pages 83–90, February 2018.
  50. IODA: A host/device co-design for strong predictability contract on modern flash storage. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP’21), pages 263–279, October 2021.
  51. An in-depth analysis of cloud block storage workloads in large scale production. In Proceedings of the 2020 IEEE International Symposium on Workload Characterization (IISWC’20), pages 37–47, October 2020.
  52. Elastic parity logging for SSD RAID arrays. In Proceedings of the 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’16), pages 49–60, June 2016.
  53. Log-ROC: Log structured RAID on open-channel SSD. In Proceedings of the 2022 IEEE 40th International Conference on Computer Design (ICCD), pages 332–335, October 2022.
  54. Umesh Maheshwari. StripeFinder: Erasure coding of small objects over Key-Value storage devices (an uphill battle). In Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20), pages 1–7, July 2020.
  55. Umesh Maheshwari. From blocks to rocks: A natural extension of zoned namespaces. In Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’21), pages 21–27, July 2021.
  56. Zea, A data management approach for SMR. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16), pages 1–5, June 2016.
  57. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12), pages 1–16, February 2012.
  58. eZNS: An elastic zoned namespace for commodity ZNS SSDs. In Proceedings of the 17th USENIX Conference on Operating Systems Design and Implementation (OSDI’23), pages 461–477, July 2023.
  59. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351–385, 1996.
  60. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD’88), pages 109–116, June 1988.
  61. Hybrid data reliability for emerging key-value storage devices. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20), pages 309–322, February 2020.
  62. Append is near: Log-based data management on ZNS SSDs. In Proceedings of the 12th Annual Conference on Innovative Data Systems Research (CIDR’22)), pages 1–10, January 2022.
  63. KVRAID: high performance, write efficient, update friendly erasure coding scheme for KV-SSDs. In Proceedings of the 14th ACM International Conference on Systems and Storage (SYSTOR’21), pages 1–12, June 2021.
  64. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS), 10(1):26–52, 1992.
  65. Is garbage collection overhead gone? case study of F2FS on ZNS SSDs. In Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’23), pages 102–108, July 2023.
  66. Disaggregated RAID storage in modern datacenters. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’23), Volume 2, pages 147–163, January 2023.
  67. Qiuping Wang and Patrick P. C. Lee. ZapRAID: Enabling high-performance RAID for ZNS SSDs via zone append. In Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys’23), pages 24–29, August 2023.
  68. StRAID: Stripe-threaded architecture for parity-based RAIDs with ultra-fast SSDs. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’22), pages 915–932, July 2022.
  69. GC-aware request steering with improved performance and reliability for SSD-based RAIDs. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS’18), pages 296–305, May 2018.
  70. Write skew and zipf distribution: Evidence and implications. ACM Transactions on Storage, 2016.
  71. ScalaRAID: optimizing linux software RAID system for next-generation storage. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage’22), pages 119–125, June 2022.
  72. RAID+: Deterministic and balanced data distribution for large disk enclosures. In Proc. of USENIX FAST, 2018.
  73. De-indirection for flash-based SSDs with nameless writes. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12), pages 1–16, February 2012.
Citations (1)

Summary

We haven't generated a summary for this paper yet.