Accelerating Relational Database Analytical Processing with Bulk-Bitwise Processing-in-Memory (2307.00658v1)
Abstract: Online Analytical Processing (OLAP) for relational databases is a business decision support application. The application receives queries about the business database, usually requesting to summarize many database records, and produces few results. Existing OLAP requires transferring a large amount of data between the memory and the CPU, having a few operations per datum, and producing a small output. Hence, OLAP is a good candidate for processing-in-memory (PIM), where computation is performed where the data is stored, thus accelerating applications by reducing data movement between the memory and CPU. In particular, bulk-bitwise PIM, where the memory array is a bit-vector processing unit, seems a good match for OLAP. With the extensive inherent parallelism and minimal data movement of bulk-bitwise PIM, OLAP applications can process the entire database in parallel in memory, transferring only the results to the CPU. This paper shows a full stack adaptation of a bulk-bitwise PIM, from compiling SQL to hardware implementation, for supporting OLAP applications. Evaluating the Star Schema Benchmark (SSB), bulk-bitwise PIM achieves a 4.65X speedup over Monet-DB, a standard database system.
- B. Perach, R. Ronen, B. Kimelfeld, and S. Kvatinsky, “PIMDB: Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics,” 2022. [Online]. Available: https://arxiv.org/abs/2203.10486
- V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-memory accelerator for bulk bitwise operations using commodity dram technology,” in 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 273–287.
- N. Hajinazar, G. F. Oliveira, S. Gregorio, J. a. D. Ferreira, N. M. Ghiasi, M. Patel, M. Alser, S. Ghose, J. Gómez-Luna, and O. Mutlu, “Simdram: A framework for bit-serial simd processing using dram,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 329–345. [Online]. Available: https://doi.org/10.1145/3445814.3446749
- S. Resch, S. K. Khatamifard, Z. I. Chowdhury, M. Zabihi, Z. Zhao, H. Cilasun, J. P. Wang, S. S. Sapatnekar, and U. R. Karpuzcu, “MOUSE: Inference In Non-volatile Memory for Energy Harvesting Applications,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 400–414.
- S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie, “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” in 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), 2016, pp. 1–6.
- V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, “RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization,” in 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013, pp. 185–197.
- J. H. Kim, S.-h. Kang, S. Lee, H. Kim, W. Song, Y. Ro, S. Lee, D. Wang, H. Shin, B. Phuah, J. Choi, J. So, Y. Cho, J. Song, J. Choi, J. Cho, K. Sohn, Y. Sohn, K. Park, and N. S. Kim, “Aquabolt-xl: Samsung hbm2-pim with in-memory processing for ml accelerators and beyond,” in 2021 IEEE Hot Chips 33 Symposium (HCS), 2021, pp. 1–26.
- J. Reuben, R. Ben-Hur, N. Wald, N. Talati, A. H. Ali, P.-E. Gaillardon, and S. Kvatinsky, “Memristive logic: A framework for evaluation and comparison,” in 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2017, pp. 1–8.
- W. A. Wulf and S. A. McKee, “Hitting the memory wall: Implications of the obvious,” ACM SIGARCH Computer Architecture News, vol. 23, no. 1, p. 20–24, 1995. [Online]. Available: https://doi.org/10.1145/216585.216588
- J. Langguth, X. Cai, and M. Sourouri, “Memory bandwidth contention: Communication vs computation tradeoffs in supercomputers with multicore architectures,” in 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), 2018, pp. 497–506.
- A. Pedram, S. Richardson, M. Horowitz, S. Galal, and S. Kvatinsky, “Dark memory and accelerator-rich system optimization in the dark silicon era,” IEEE Design & Test, vol. 34, no. 2, pp. 39–50, 2017.
- B. Perach, R. Ronen, and S. Kvatinsky, “On consistency for bulk-bitwise processing-in-memory,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 705–717.
- ——, “Enabling relational database analytical processing in bulk-bitwise processing-in-memory,” 2023. [Online]. Available: https://arxiv.org/abs/2302.01675
- N. Binkert et al., “The gem5 simulator,” SIGARCH Comput. Archit. News, vol. 39, no. 2, p. 1–7, August 2011.
- M. Dreseler, M. Boissier, T. Rabl, and M. Uflacker, “Quantifying tpc-h choke points and their optimizations,” Proc. VLDB Endow., vol. 13, no. 8, p. 1206–1220, 2020. [Online]. Available: https://doi.org/10.14778/3389133.3389138
- “TPC benchmark H standard specification revision 3.0.0,” http://tpc.org/tpch/, Transaction Processing Performance Council, 2021.
- S. K. Shin and G. L. Sanders, “Denormalization Strategies for Data Retrieval from Data Warehouses,” Decis. Support Syst., vol. 42, no. 1, p. 267–282, 2006.
- R. Chirkova and J. Yang, “Materialized Views,” Foundations and Trends® in Databases, vol. 4, no. 4, pp. 295–405, 2012.
- X. Yu et al., “PushdownDB: Accelerating a DBMS Using S3 Computation,” in ICDE-36, 2020.
- S. Idreos et al., “MonetDB: Two Decades of Research in Column-oriented Database Architectures,” IEEE Data Eng. Bull., vol. 35, no. 1, pp. 40–45, 2012.