Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives (2403.06938v1)

Published 11 Mar 2024 in cs.AR

Abstract: As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent push for in-storage computation, where processing is performed inside the storage device. We propose TCAM-SSD, a new framework for search-based computation inside the NAND flash memory arrays of a conventional solid-state drive (SSD), which requires lightweight modifications to only the array periphery and firmware. TCAM-SSD introduces a search manager and link table, which can logically partition the NAND flash memory's contents into search-enabled regions and standard storage regions. Together, these light firmware changes enable TCAM-SSD to seamlessly handle block I/O operations, in addition to new search operations, thereby reducing end-to-end execution time and total data movement. We provide an NVMe-compatible interface that provides programmers with the ability to dynamically allocate data on and make use of TCAM-SSD, allowing the system to be leveraged by a wide variety of applications. We evaluate three example use cases of TCAM-SSD to demonstrate its benefits. For transactional databases, TCAM-SSD can mitigate the performance penalties for applications with large datasets, achieving a 60.9% speedup over a conventional system that retrieves data from the SSD and computes using the CPU. For database analytics, TCAM-SSD provides an average speedup of 17.7x over a conventional system for a collection of analytical queries. For graph analytics, we combine TCAM-SSD's associative search with a sparse data structure, speeding up graph computing for larger-than-memory datasets by 14.5%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (187)
  1. Advanced Micro Devices, Inc. 2021. Samsung SmartSSD. https://www.xilinx.com/applications/data-center/computational-storage/smartssd.html.
  2. Compute Caches. In HPCA.
  3. R. Agrawal and R. Srikant. 1994. Fast Algorithms for Mining Association Rules. In VLDB.
  4. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. In ISCA.
  5. Enabling CXL Memory Expansion for In-Memory Database Management Systems. In DaMoN.
  6. A. V. Aho and J. D. Ullman. 1992. Foundations of Computer Science. Computer Science Press, Inc.
  7. Hybrid CMOS/Nanodevice Circuits for High Throughput Pattern Matching Applications. In AHS.
  8. PIMA-Logic: A Novel Processing-in-Memory Architecture for Highly Flexible and Energy-Efficient Logic Computation. In DAC.
  9. CMP-PIM: An Energy-Efficient Comparator-Based Processing-in-Memory Neural Network Accelerator. In DAC.
  10. AlignS: A Processing-in-Memory Accelerator for DNA Short Read Alignment Leveraging SOT-MRAM. In DAC.
  11. PUMA: A Programmable Ultra-Efficient Memristor-Based Accelerator for Machine Learning Inference. In ASPLOS.
  12. A Ternary Content-Addressable Memory (TCAM) Based on 4T Static Storage and Including a Current-Race Sensing Scheme. JSSC (Jan. 2003).
  13. Workload Analysis of a Large-Scale Key-Value Store. (2012).
  14. B. Marr. 2018. How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read. https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/?sh=667f02260ba9.
  15. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In KDD.
  16. Intelligent SSD: A Turbo for Big Data Mining. In CIKM.
  17. K. E. Batcher. 1974. STARAN Parallel Processor System Hardware. In AFIPS.
  18. S. Brin and L. Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems (Apr. 1998).
  19. Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives. Proc. IEEE (Sep. 2017).
  20. Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques. In HPCA.
  21. Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis. In DATE.
  22. Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization, Analysis, and Modeling. In DATE.
  23. Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery. In DSN.
  24. Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery. In HPCA.
  25. Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories. In SIGMETRICS.
  26. Accelerating Database Analytic Query Workloads Using an Associative Processor. In ISCA.
  27. CAPE: A Content-Addressable Processing Engine. In HPCA.
  28. Improved TCAM-Based Pre-Filtering for Network Intrusion Detection Systems. In AINA.
  29. On the Efficiency and Programmability of Large Graph Processing in the Cloud. Technical Report MSR-TR-2010-44. Microsoft Research.
  30. XSD: Accelerating MapReduce by Harnessing the GPU Inside an SSD. In DIMES.
  31. CASCADE: Connecting RRAMs to Extend Analog Dataflow in an End-to-End In-Memory Processing Paradigm. In MICRO.
  32. PaLM: Scaling Language Modeling With Pathways. JMLR (Aug. 2023).
  33. CRZ Technology. 2019. Daisy OpenSSD. http://www.mangoboard.com/main/view.asp?idx=1061&pageNo=1&cate1=9&cate2=150&cate3=181
  34. PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In ISCA.
  35. B. Dally. 2015. Challenges for Future Computing Systems. Keynote talk at HiPEAC.
  36. FlashStore: High Throughput Persistent Key-Value Store. Proc. VLDB Endow. (Sep. 2010).
  37. Anti-Caching: A New Approach to Database Management System Architecture. Proc. VLDB Endow. (Sep. 2013).
  38. Dynamo: Amazon’s Highly Available Key-Value Store. OSR (Oct. 2007).
  39. N. Derharcobian and C. N. Murphy. 2010. Phase-Change Memory (PCM) Based Universal Content-Addressable Memory (CAM) Configured as Binary/Ternary CAM. U.S. Patent No. 7,675,765 B2.
  40. F. Devaux. 2019. The True Processing in Memory Accelerator. In Hot Chips.
  41. Hekaton: SQL Server’s Memory-Optimized OLTP Engine. In SIGMOD.
  42. Query Processing on Smart SSDs: Opportunities and Challenges. In SIGMOD.
  43. Domo, Inc. 2014. Data Never Sleeps 2.0. https://www.domo.com/learn/infographic/data-never-sleeps-2.
  44. Domo, Inc. 2022. Data Never Sleeps 10.0. https://www.domo.com/learn/infographic/data-never-sleeps-10.
  45. Domo, Inc. 2023. Data Never Sleeps 11.0. https://www.domo.com/learn/infographic/data-never-sleeps-11.
  46. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. In ISCA.
  47. Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database. Proc. VLDB Endow. (Jul. 2014).
  48. Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines. TVLSI (Aug. 2010).
  49. C. C. Foster. 1976. Content Addressable Parallel Processors. John Wiley & Sons, Inc.
  50. Duality Cache for Data Parallel Acceleration. In ISCA.
  51. The Programmable Logic-in-Memory (PLiM) Computer. In DATE.
  52. ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory Based SSDs. In MICRO.
  53. Processing-in-Memory: A Workload-Driven Perspective. IBM JRD (Nov.–Dec. 2019).
  54. Ashish Goel and Pankaj Gupta. 2010. Small Subset Queries and Bloom Filters Using Ternary Associative Memories, With Applications. In SIGMETRICS.
  55. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI.
  56. Memristor TCAMs Accelerate Regular Expression Matching for Network Intrusion Detection. TNANO (Aug. 2019).
  57. In-Memory Computing With Memristor Content Addressable Memories for Pattern Matching. Adv. Mat. (Aug. 2020).
  58. Biscuit: A Framework for Near-Data Processing of Big Data Workloads. In ISCA.
  59. A Resistive TCAM Accelerator for Data-Intensive Computing. In MICRO.
  60. AC-DIMM: Associative Computing With STT-MRAM. In ISCA.
  61. FELIX: Fast and Energy-Efficient Logic in Memory. In ICCAD.
  62. Platform Storage Performance With 3D XPoint Technology. Proc. IEEE (Sep. 2017).
  63. SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM. In ASPLOS.
  64. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In MICRO.
  65. Memristor for Computing: Myth or Reality?. In DATE.
  66. Newton: A DRAM-Maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning. In ISCA.
  67. ICE: An Intelligent Cognition Engine With 3D NAND-Based In-Memory Computing for Vector Similarity Search Acceleration. In MICRO.
  68. FlexECC: Partially Relaxing ECC of MLC SSD for Better Cache Performance. In USENIX ATC.
  69. PinK: High-Speed In-Storage Key-Value Store With Bounded Tails. In USENIX ATC.
  70. Pinot: Realtime OLAP for 530 Million Users. In SIGMOD.
  71. FloatPIM: In-Memory Acceleration of Deep Neural Network Training With High Precision. In ISCA.
  72. MASC: Ultra-Low Energy Multiple-Access Single-Charge TCAM for Approximate Computing. In DATE.
  73. A 28 nm/times28dividenanometerabsent28\text{\,}\mathrm{nm}\text{/}start_ARG 28 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_nm end_ARG start_ARG divide end_ARG start_ARG end_ARG end_ARG Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory. In JSSC.
  74. Phoenix: Reviving MLC Blocks as SLC to Extend NAND Flash Devices Lifetime. In DATE.
  75. KAML: A Flexible, High-Performance Key-Value SSD. In HPCA.
  76. S. W. Jones. 2020. Economics in the 3D Era. Talk at LithoVision.
  77. BlueDBM: An Appliance for Big Data Analytics. In ISCA.
  78. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics. In ISCA.
  79. Superblock FTL: A Superblock-Based Flash Translation Layer With a Hybrid Address Translation Scheme. ACM Trans. Embed. Comput. Syst. (Apr. 2010).
  80. SLM-DB: Single-Level Key-Value Store With Persistent Memory. In FAST.
  81. H-Store: A High-Performance, Distributed Main Memory Transaction Processing System. Proc. VLDB Endow. (Aug. 2008).
  82. Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling. In The Memory Forum.
  83. Enabling Cost-Effective Data Processing With Smart SSD. In MSST.
  84. Towards Building a High-Performance, Scale-In Key-Value Storage System. In SYSTOR.
  85. A. Kemper and T. Neumann. 2011. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE.
  86. BionicDB: Fast and Power-Efficient OLTP on FPGA. In EDBT.
  87. Fast, Energy Efficient Scan Inside Flash Memory SSDs. In ADMS.
  88. Meet the Walkers Accelerating Index Traversals for In-Memory Databases. In MICRO.
  89. T. Kohonen. 1980. Content-Addressable Memories. Springer-Verlag.
  90. Summarizer: Trading Communication With Computing Near Storage. In MICRO.
  91. Cosmos+ OpenSSD: Rapid Prototype for Flash Storage Systems. TOS (Jul. 2020).
  92. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI.
  93. Modeling Analytics for Computational Storage. In ICPE.
  94. Algorithms for Advanced Packet Classification With Ternary CAMs. In SIGCOMM.
  95. J. Lawley. 2014. Understanding Performance of PCI Express Systems. White Paper WP350. Xilinx, Inc.
  96. A 1.8V 1Gb NAND Flash Memory With 0.12µm STI Process Technology. In ISSCC.
  97. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology. In ISCA.
  98. T. J. Lehman and M. J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In VLDB.
  99. LeanStore: In-Memory Data Management Beyond Main Memory. In ICDE.
  100. Graphs Over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD.
  101. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Math. (Jan. 2009).
  102. ReSQM: Accelerating Database Operations Using ReRAM-Based Content Addressable Memory. IEEE TCAD (Nov. 2020).
  103. 1 Mb 0.41 µm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing. JSSC (Apr. 2014).
  104. DRISA: A DRAM-Based Reconfigurable In-Situ Accelerator. In MICRO.
  105. H. Liu and H. H. Huang. 2017. Graphene: Fine-Grained IO Management for Graph Computing. In FAST.
  106. Performance Evaluation of InfiniBand With PCI Express. In HOTI.
  107. Mosaic: Processing a Trillion-Edge Graph on a Single Machine. In EuroSys.
  108. DeepStore: In-Storage Acceleration for Intelligent Queries. In MICRO.
  109. Pregel: A System for Large-Scale Graph Processing. In SIGMOD.
  110. Everything You Always Wanted to Know About Multicore Graph Processing but Were Afraid to Ask. In USENIX ATC.
  111. Challenges and Future Directions for the Scaling of Dynamic Random-Access Memory (DRAM). IBM JRD (Mar. 2002).
  112. GraphSSD: Graph Semantics Aware SSD. In ISCA.
  113. Design of Hybrid SSDs With Storage Class Memory and NAND Flash Memory. Proc. IEEE (2017).
  114. Standby-Power-Free Compact Ternary Content-Addressable Memory Cell Chip Using Magnetic Tunnel Junction Devices. APEX (Feb. 2009).
  115. Fully Parallel 6T-2MTJ Nonvolatile TCAM With Single-Transistor-Based Self Match-Line Discharge Control. In VLSIC.
  116. On Using the CAM Concept for Parametric Curve Extraction. TIP (2000).
  117. O. Mutlu. 2013. Memory Scaling: A Systems Architecture Perspective. In IMW.
  118. Active Memory Cube: A Processing-in-Memory Architecture for Exascale Systems. IBM JRD (Mar.–May 2015).
  119. Design of a Compact Spin-Orbit-Torque-Based Ternary Content Addressable Memory. TED (Feb. 2023).
  120. T. Neumann and M. J. Freitag. 2020. Umbra: A Disk-Based System With In-Memory Performance.. In CIDR.
  121. NVM Express, Inc. 2021a. NVM Express® Base Specification, Revision 2.0a.
  122. NVM Express, Inc. 2021b. NVM Express® NVM Command Set Specification, Revision 1.0a.
  123. NVM Express, Inc. 2022. NVM Express® Key Value Command Set Specification, Revision 1.0b.
  124. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks. IEEE Access (2021).
  125. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford Univ. InfoLab.
  126. K. Pagiamtzis and A. Sheikholeslami. 2006. Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey. JSSC (Feb. 2006).
  127. Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory. In MICRO.
  128. SaS: SSD as SQL Database System. Proc. VLDB Endow. (May 2021).
  129. A Zeroing Cell-to-Cell Interference Page Architecture With Temporary LSB Storing and Parallel MSB Program Scheme for MLC NAND Flash Memories. JSSC (2008).
  130. Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory. In SC.
  131. T.-B. Pei and C. Zukowski. 1991. VLSI Implementation of Routing Tables: Tries and CAMs. In INFCOM.
  132. ASC: An Associative-Computing Paradigm. Computer (Nov. 1994).
  133. J. L. Potter. 2012. Associative Computing: A Programming Paradigm for Massively Parallel Computers. Springer Science & Business Media.
  134. R. Cheerla. 2019. Computational SSDs. https://www.snia.org/sites/default/files/SDCEMEA/2019/Presentations/Computational_SSDs_Final.pdf. Talk at SDC EMEA.
  135. Demonstration of CAM and TCAM Using Phase Change Devices. In IMW.
  136. V. C. Ravikumar and R. N. Mahapatra. 2004. TCAM Architecture for IP Lookup Using Prefix Properties. IEEE Micro (Mar.–Apr. 2004).
  137. Data Age 2025: The Digitization of the World From Edge to Core. Technical Report. IDC.
  138. Chaos: Scale-Out Graph Processing From Secondary Storage. In SOSP.
  139. X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. In SOSP.
  140. NASCENT: Near-Storage Acceleration of Database Sort on SmartSSD. In FPGA.
  141. Samsung Electronics Co., Ltd. [n. d.]. HBM Processing in Memory. https://www.samsung.com/semiconductor/solutions/technology/hbm-processing-in-memory/.
  142. Samsung Electronics Co., Ltd. 2022. Samsung Electronics Develops Second-Generation SmartSSD Computational Storage Drive With Upgraded Processing Functionality. https://news.samsung.com/global/samsung-electronics-develops-second-generation-smartssd-computational-storage-drive-with-upgraded-processing-functionality.
  143. Computational Storage for Big Data Analytics. In ADMS.
  144. G. E Sayre. 1976. STARAN: An Associative Approach to Multiprocessor Architecture. In Comp. Arch.: Wkshp. of Gesellschaft für Informatik Erlangen.
  145. Willow: A User-Programmable SSD. In OSDI.
  146. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In MICRO.
  147. ISAAC: A Convolutional Neural Network Accelerator With In-Situ Analog Arithmetic in Crossbars. In ISCA.
  148. Large-Scale In-Memory Analytics on Intel® Optane™ DC Persistent Memory. In DaMoN.
  149. W. Shim and S. Yu. 2022. GP3D: 3D NAND Based In-Memory Graph Processing Accelerator. JETCAS (Jun. 2022).
  150. A New 3-Bit Programming Algorithm Using SLC-to-TLC Migration for 8MB/S High Performance TLC NAND Flash Memory. In VLSIC.
  151. J. Shun and G. E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In PPoPP.
  152. Packet Classification Using Extended TCAMs. In ICNP.
  153. R. Stoica and A. Ailamaki. 2013. Enabling Efficient OS Paging for Main-Memory OLTP Databases. In DaMoN.
  154. M. Stonebraker and A. Weisberg. 2013. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. (Jun. 2013).
  155. Energy-Efficient SQL Query Exploiting RRAM-Based Process-in-Memory Structure. In NVMSA.
  156. Approaching DRAM Performance by Using Microsecond-Latency Flash Memory for Small-Sized Random Read Accesses: A New Access Method and Its Graph Applications. Proc. VLDB Endow. (Apr. 2021).
  157. MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices. In FAST.
  158. Transaction Processing Council. 2010. TPC-C Benchmark. http://www.tpc.org/tpcc/spec/tpcc_current.pdf
  159. Transaction Processing Council. 2011. TPC-H DBGEN. https://github.com/electrum/tpch-dbgen.
  160. Transaction Processing Council. 2021. TPC-H Benchmark. http://www.tpc.org/tpch
  161. In-Memory-Searching Architecture Based on 3D-NAND Technology With Ultra-High Parallelism. In IEDM.
  162. UPMEM SAS. [n. d.]. Technology. https://www.upmem.com/technology/.
  163. Load the Edges You Need: A Generic I/O Optimization for Disk-Based Graph Processing. In USENIX ATC.
  164. R. Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation. Univ. of California, Berkeley.
  165. J. P. Wade and C. G. Sodini. 1987. Dynamic Cross-Coupled Bit-Line Content Addressable Memory Cell for High-Density Arrays. JSSC (Feb. 1987).
  166. Microsoft Academic Graph: When Experts Are Not Enough. QSS (Feb. 2020).
  167. RC-NVM: Enabling Symmetric Row and Column Memory Accesses for In-Memory Databases. In HPCA.
  168. Web Data Commons. 2014. Hyperlink Graphs. http://webdatacommons.org/hyperlinkgraph/.
  169. Ibex—An Intelligent Storage Engine With Support for Advanced SQL Offloading. Proc. VLDB Endow. (Jul. 2014).
  170. Exploiting Intel Optane SSD for Microsoft SQL Server. In DaMoN.
  171. AQUOMAN: An Analytic-Query Offloading Machine. In MICRO.
  172. BlueCache: A Scalable Distributed Flash-Based Key-Value Store. Proc. VLDB Endow. (Nov. 2016).
  173. Design of Spin-Torque Transfer Magnetoresistive RAM and CAM/TCAM With High Sensing and Search Speed. TVLSI (Jan. 2010).
  174. J. Yang and J. Leskovec. 2011. Patterns of Temporal Variation in Online Media. In WSDM.
  175. J. Yang and J. Leskovec. 2012. Defining and Evaluating Network Communities Based on Ground-Truth. In KDD.
  176. Utilization-Aware Self-Tuning Design for TLC Flash Storage Devices. TVLSI (Oct. 2016).
  177. GIRAF: General Purpose In-Storage Resistive Associative Framework. TPDS (2021).
  178. An Ultra-Dense 2FeFET TCAM Design Based on a Multi-Domain FeFET Model. TCAS-II (Dec. 2019).
  179. Staring Into the Abyss: An Evaluation of Concurrency Control With One Thousand Cores. Proc. VLDB Endow. (Nov. 2014).
  180. CoolCAMs: Power-Efficient TCAMs for Forwarding Engines. In INFOCOM.
  181. Y. Zha and J. Li. 2018. Liquid Silicon: A Data-Centric Reconfigurable Architecture Enabled by RRAM Technology. In FPGA.
  182. Y. Zha and J. Li. 2020. Hyper-AP: Enhancing Associative Processing Through a Full-Stack Optimization. In ISCA.
  183. Reducing the Storage Overhead of Main-Memory OLTP Databases With Hybrid Indexes. In SIGMOD.
  184. NUMA-Aware Graph-Structured Analytics. In PPoPP.
  185. Redis++: A High Performance In-Memory Database Based on Segmented Memory Management and Two-Level Hash Index. In BDCloud.
  186. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. In FAST.
  187. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In USENIX ATC.

Summary

We haven't generated a summary for this paper yet.