Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics (2203.10486v2)
Abstract: Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This paper examines the capabilities of bulk-bitwise PIM by constructing PIMDB, a fully-digital system based on memristive stateful logic, utilizing and focusing on in-memory bulk-bitwise operations, designed to accelerate a real-life workload: analytical processing of relational databases. We introduce a host processor programming model to support bulk-bitwise PIM in virtual memory, develop techniques to efficiently perform in-memory filtering and aggregation operations, and adapt the application data set into the memory. To understand bulk-bitwise PIM, we compare it to an equivalent in-memory database on the same host system. We show that bulk-bitwise PIM substantially lowers the number of required memory read operations, thus accelerating TPC-H filter operations by 1.6$\times$--18$\times$ and full queries by 56$\times$--608$\times$, while reducing the energy consumption by 1.7$\times$--18.6$\times$ and 0.81$\times$--12$\times$ for these benchmarks, respectively. Our extensive evaluation uses the gem5 full-system simulation environment. The simulations also evaluate cell endurance, showing that the required endurance is within the range of existing endurance of RRAM devices.
- D. Abadi, S. Madden, and M. Ferreira, “Integrating compression and execution in column-oriented database systems,” in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’06. New York, NY, USA: Association for Computing Machinery, 2006, p. 671–682. [Online]. Available: https://doi.org/10.1145/1142473.1142548
- J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “Pim-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture,” in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), 2015, pp. 336–348.
- R. Ben-Hur, R. Ronen, A. Haj-Ali, D. Bhattacharjee, A. Eliahu, N. Peled, and S. Kvatinsky, “Simpler magic: Synthesis and mapping of in-memory logic executed in a single row to improve throughput,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2434–2447, 2020.
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The gem5 simulator,” SIGARCH Comput. Archit. News, vol. 39, no. 2, p. 1–7, Aug. 2011. [Online]. Available: https://doi.org/10.1145/2024716.2024718
- J. Borghetti, G. Snider, P. Kuekes, J. Yang, D. Stewart, and R. Williams, “’memristive’ switches enable ‘stateful’ logic operations via material implication,” Nature, vol. 464, no. 1476-4687, pp. 873–876, April 2010.
- K. Chandrasekar, B. Akesson, and K. Goossens, “Improved power modeling of ddr sdrams,” in 2011 14th Euromicro Conference on Digital System Design, 2011, pp. 99–108.
- D. Chen, Z. Li, T. Xiong, Z. Liu, J. Yang, S. Yin, S. Wei, and L. Liu, “Catcam: Constant-time alteration ternary cam with scalable in-memory architecture,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 342–355.
- B. Dageville, D. Das, K. Dias, K. Yagoub, M. Zait, and M. Ziauddin, “Automatic sql tuning in oracle 10g,” in Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, ser. VLDB ’04. VLDB Endowment, 2004, p. 1098–1109.
- “TPC Benchmark H Full Disclosure Report For Dell Technologies PowerEdge MX740c Modular Server While Using Microsoft SQL Server 2019 Enterprise Edition and Red Hat® Enterprise Linux® 8.0,” http://tpc.org/3369, Dell Technologies, March 2021.
- “TPC Benchmark H Full Disclosure Report For Dell Technologies PowerEdgee R7515 Server While Using Microsoft SQL Server 2019 Enterprise Edition 64 bit and Red Hat® Enterprise Linux® 8.0,” https://tpc.org/3374, Dell Technologies, April 2021.
- X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, “NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994–1007, 2012.
- I. Giannopoulos, A. Singh, M. Le Gallo, V. P. Jonnalagadda, S. Hamdioui, and A. Sebastian, “In-memory database query,” Advanced Intelligent Systems, vol. 2, no. 12, p. 2000141, 2020. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/aisy.202000141
- S. Gupta, M. Imani, B. Khaleghi, V. Kumar, and T. Rosing, “Rapid: A reram processing in-memory architecture for dna sequence alignment,” in 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2019, pp. 1–6.
- N. Hajinazar, G. F. Oliveira, S. Gregorio, J. a. D. Ferreira, N. M. Ghiasi, M. Patel, M. Alser, S. Ghose, J. Gómez-Luna, and O. Mutlu, “SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 329–345. [Online]. Available: https://doi.org/10.1145/3445814.3446749
- M. Imani, S. Gupta, Y. Kim, and T. Rosing, “Floatpim: In-memory acceleration of deep neural network training with high precision,” in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), 2019, pp. 802–815.
- M. Imani, S. Gupta, S. Sharma, and T. S. Rosing, “Nvquery: Efficient query processing in nonvolatile memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 4, pp. 628–639, 2019.
- “Intel® Optane™ Persistent Memory,” https://www.intel.com/content/www/us/en/products/memory-storage/optane-dc-persistent-memory.html, Intel, accessed: 2021.
- T. R. Kepe, E. C. de Almeida, and M. A. Z. Alves, “Database processing-in-memory: An experimental study,” Proc. VLDB Endow., vol. 13, no. 3, p. 334–347, November 2019. [Online]. Available: https://doi.org/10.14778/3368289.3368298
- S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, “Magic—memristor-aided logic,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 61, no. 11, pp. 895–899, 2014.
- S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, “McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures,” in 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009, pp. 469–480.
- S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie, “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” in 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), 2016, pp. 1–6.
- L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim, “GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks,” in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017, pp. 457–468.
- “Whitepaper NVIDIA Tesla P100,” https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf, NVIDIA, 2016.
- M. Pilman, K. Bocksrocker, L. Braun, R. Marroquín, and D. Kossmann, “Fast scans on key-value stores,” Proc. VLDB Endow., vol. 10, no. 11, p. 1526–1537, aug 2017. [Online]. Available: https://doi.org/10.14778/3137628.3137659
- T. Pohanka and V. Pechanec, “Evaluation of replication mechanisms on selected database systems,” ISPRS International Journal of Geo-Information, vol. 9, no. 4, 2020. [Online]. Available: https://www.mdpi.com/2220-9964/9/4/249
- S. Resch, S. K. Khatamifard, Z. I. Chowdhury, M. Zabihi, Z. Zhao, H. Cilasun, J. P. Wang, S. S. Sapatnekar, and U. R. Karpuzcu, “Mouse: Inference in non-volatile memory for energy harvesting applications,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 400–414.
- J. Reuben, R. Ben-Hur, N. Wald, N. Talati, A. H. Ali, P. Gaillardon, and S. Kvatinsky, “Memristive logic: A framework for evaluation and comparison,” in 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2017, pp. 1–8.
- M. M. Sabry Aly, T. F. Wu, A. Bartolo, Y. H. Malviya, W. Hwang, G. Hills, I. Markov, M. Wootters, M. M. Shulaker, H. . S. Philip Wong, and S. Mitra, “The n3xt approach to energy-efficient abundant-data computing,” Proceedings of the IEEE, vol. 107, no. 1, pp. 19–48, 2019.
- V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-memory accelerator for bulk bitwise operations using commodity dram technology,” in 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 273–287.
- A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016, pp. 14–26.
- J. Stuecheli, S. Willenborg, and W. Starke, “IBM’s Next Generation POWER Processor,” in 2019 IEEE Hot Chips 31 Symposium (HCS), 2019, pp. 1–19.
- N. Talati, S. Gupta, P. Mane, and S. Kvatinsky, “Logic design within memristive memories using memristor-aided logic (magic),” IEEE Transactions on Nanotechnology, vol. 15, no. 4, pp. 635–650, 2016.
- N. Talati, H. Ha, B. Perach, R. Ronen, and S. Kvatinsky, “Concept: A column-oriented memory controller for efficient memory and pim operations in rram,” IEEE Micro, vol. 39, no. 1, pp. 33–43, 2019.
- V. Tenace, R. G. Rizzo, D. Bhattacharjee, A. Chattopadhyay, and A. Calimera, “Said: A supergate-aided logic synthesis flow for memristive crossbars,” in 2019 Design, Automation Test in Europe Conference Exhibition (DATE), 2019, pp. 372–377.
- “TPC benchmark H standard specification revision 3.0.0,” http://tpc.org/tpch/, Transaction Processing Performance Council, February 2021.
- M. S. Q. Truong, E. Chen, D. Su, L. Shen, A. Glass, L. R. Carley, J. A. Bain, and S. Ghose, “Racer: Bit-pipelined processing using resistive memory,” in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 100–116. [Online]. Available: https://doi.org/10.1145/3466752.3480071
- C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie, “Overcoming the challenges of crossbar resistive memory architectures,” in 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 2015, pp. 476–488.
- S. Xu, T. Bourgeat, T. Huang, H. Kim, S. Lee, and A. Arvind, “Aquoman: An analytic-query offloading machine,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 386–399.
- Y. Yang, M. Youill, M. Woicik, Y. Liu, X. Yu, M. Serafini, A. Aboulnaga, and M. Stonebraker, “Flexpushdowndb: Hybrid pushdown and caching in a cloud dbms,” Proc. VLDB Endow., vol. 14, no. 11, p. 2101–2113, jul 2021. [Online]. Available: https://doi.org/10.14778/3476249.3476265
- F. Zahoor, Z. Azni, Z. Tun, and F. A. Khanday, “Resistive Random Access Memory (RRAM): an Overview of Materials, Switching Mechanism, Performance, Multilevel Cell (mlc) Storage, Modeling, and Applications,” Nanoscale Research Letters, vol. 15, April 2020. [Online]. Available: https://doi.org/10.1186/s11671-020-03299-9
- M. Zhou, M. Imani, Y. Kim, S. Gupta, and T. Rosing, “Dp-sim: A full-stack simulation infrastructure for digital processing in-memory architectures,” in 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC), 2021, pp. 639–644.
- Ben Perach (9 papers)
- Ronny Ronen (22 papers)
- Benny Kimelfeld (57 papers)
- Shahar Kvatinsky (47 papers)