X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs (2304.01285v3)
Abstract: Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.
- L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why do tree-based models still outperform deep learning on typical tabular data?” in Advances in Neural Information Processing Systems, Datasets and Benchmarks Track. Curran Associates, Inc., 2022.
- S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable AI for trees,” Nature Machine Intelligence, vol. 2, no. 1, pp. 56–67, Jan. 2020.
- “State of data science and machine learning 2021,” https://www.kaggle.com/kaggle-survey-2021, accessed: 2022-07-25.
- Z. Xie, W. Dong, J. Liu, H. Liu, and D. Li, “Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU,” in Proceedings of the Sixteenth European Conference on Computer Systems. ACM, Apr. 2021, pp. 426–440.
- M. He, M. Thottethodi, and T. N. Vijaykumar, “Booster: An accelerator for gradient boosting decision trees training and inference,” in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022, pp. 1051–1062.
- “Ieee-cis fraud detection competition,” https://www.kaggle.com/code/cdeotte/xgb-fraud-with-magic-0-9600/notebook, accessed: 2022-10-10.
- S. Summers, G. D. Guglielmo, J. Duarte, P. Harris, D. Hoang, S. Jindariani, E. Kreinar, V. Loncar, J. Ngadiuba, M. Pierini, D. Rankin, N. Tran, and Z. Wu, “Fast inference of Boosted Decision Trees in FPGAs for particle physics,” Journal of Instrumentation, vol. 15, no. 05, pp. P05 026–P05 026, May 2020, publisher: IOP Publishing.
- A. Gajjar, P. Kashyap, A. Aysu, P. Franzon, S. Dey, and C. Cheng, “Faxid: Fpga-accelerated xgboost inference for data centers using hls,” in 2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2022, pp. 1–9.
- G. Tzimpragos, A. Madhavan, D. Vasudevan, D. Strukov, and T. Sherwood, “Boosted race trees for low energy classification,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 215–228.
- M. A. Zidan, J. P. Strachan, and W. D. Lu, “The future of electronics based on memristive systems,” Nature Electronics, vol. 1, no. 1, pp. 22–29, Jan. 2018.
- A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Seoul, South Korea: IEEE, Jun. 2016, pp. 14–26. [Online]. Available: http://ieeexplore.ieee.org/document/7551379/
- K. Pagiamtzis and A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE Journal of Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006.
- C. E. Graves, C. Li, G. Pedretti, and J. P. Strachan, “In-Memory Computing with Non-volatile Memristor CAM Circuits,” in Memristor Computing Systems, L. O. Chua, R. Tetzlaff, and A. Slavova, Eds. Cham: Springer International Publishing, 2022, pp. 105–139.
- C. Li, C. E. Graves, X. Sheng, D. Miller, M. Foltin, G. Pedretti, and J. P. Strachan, “Analog content-addressable memories with memristors,” Nature Communications, vol. 11, no. 1, p. 1638, Dec. 2020.
- G. Pedretti, C. E. Graves, S. Serebryakov, R. Mao, X. Sheng, M. Foltin, C. Li, and J. P. Strachan, “Tree-based machine learning performed in-memory with memristive analog CAM,” Nature Communications, vol. 12, no. 1, p. 5806, Oct. 2021.
- G. Biau and E. Scornet, “A random forest guided tour,” TEST, vol. 25, no. 2, pp. 197–227, Jun. 2016.
- T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA: ACM, Aug. 2016, pp. 785–794.
- L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31, 2018.
- S. Raschka, J. Patterson, and C. Nolet, “Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” arXiv preprint arXiv:2002.04803, 2020.
- Q. Guo, X. Guo, Y. Bai, and E. İpek, “A resistive TCAM accelerator for data-intensive computing,” in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 ’11. Porto Alegre, Brazil: ACM Press, 2011, p. 339.
- C. E. Graves, C. Li, X. Sheng, D. Miller, J. Ignowski, L. Kiyama, and J. P. Strachan, “In‐Memory Computing with Memristor Content Addressable Memories for Pattern Matching,” Advanced Materials, vol. 32, no. 37, p. 2003437, Sep. 2020.
- H. Li, W.-C. Chen, A. Levy, C.-H. Wang, H. Wang, P.-H. Chen, W. Wan, W.-S. Khwa, H. Chuang, Y.-D. Chih, M.-F. Chang, H.-S. P. Wong, and P. Raina, “SAPIENS: A 64-kb RRAM-Based Non-Volatile Associative Memory for One-Shot Learning and Inference at the Edge,” IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6637–6643, 2021.
- D. Ielmini and H.-S. P. Wong, “In-memory computing with resistive switching devices,” Nature Electronics, vol. 1, no. 6, pp. 333–343, Jun. 2018.
- A. S. Rekhi, B. Zimmer, N. Nedovic, N. Liu, R. Venkatesan, M. Wang, B. Khailany, W. J. Dally, and C. T. Gray, “Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference,” in Proceedings of the 56th Annual Design Automation Conference 2019, ser. DAC ’19. New York, NY, USA: Association for Computing Machinery, 2019, event-place: Las Vegas, NV, USA.
- K. Ni, X. Yin, A. F. Laguna, S. Joshi, S. Dünkel, M. Trentzsch, J. Müller, S. Beyer, M. Niemier, X. S. Hu, and S. Datta, “Ferroelectric ternary content-addressable memory for one-shot learning,” Nature Electronics, vol. 2, no. 11, pp. 521–529, Nov. 2019.
- G. Pedretti, C. E. Graves, T. Van Vaerenbergh, S. Serebryakov, M. Foltin, X. Sheng, R. Mao, C. Li, and J. P. Strachan, “Differentiable Content Addressable Memory with Memristors,” Advanced Electronic Materials, p. 2101198, 2022.
- M. Kang, S. K. Gonugondla, S. Lim, and N. R. Shanbhag, “A 19.4-nJ/Decision, 364-K Decisions/s, In-Memory Random Forest Multi-Class Inference Accelerator,” IEEE Journal of Solid-State Circuits, vol. 53, no. 7, pp. 2126–2135, Jul. 2018.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep Learning with Limited Numerical Precision,” in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille, France: PMLR, Jul. 2015, pp. 1737–1746.
- C.-C. Lin, J.-Y. Hung, W.-Z. Lin, C.-P. Lo, Y.-N. Chiang, H.-J. Tsai, G.-H. Yang, Y.-C. King, C. J. Lin, T.-F. Chen, and M.-F. Chang, “7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 136–137.
- A. Vergara, S. Vembu, T. Ayhan, M. A. Ryan, M. L. Homer, and R. Huerta, “Chemical gas sensor drift compensation using classifier ensembles,” Sensors and Actuators B: Chemical, vol. 166-167, pp. 320–329, 2012.
- C. Xu, A. Lain, P. Faraboschi, N. Dube, and D. Milojicic, “Feeding the beast: High performance data pipeline for large-scale deep learning,” in A Collection of White Papers from the BDEC2 Workshop in San Diego, California, vol. 1, 2019, p. 14.
- S. Iyyer, “Churn modelling,” accessed: 2022-07-26. [Online]. Available: https://www.kaggle.com/datasets/shrutimechlearn/churn-modelling
- J. Salojärvi, K. Puolamäki, J. Simola, L. Kovanen, I. Kojo, and S. Kaski, “Inferring relevance from eye movements: Feature extraction,” in Workshop at NIPS 2005, in Whistler, BC, Canada, on December 10, 2005., 2005, p. 45.
- J. A. Blackard and D. J. Dean, “Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables,” Computers and electronics in agriculture, vol. 24, no. 3, pp. 131–151, 1999.
- P. K. Wagner, S. M. Peres, R. C. B. Madeo, C. A. de Moraes Lima, and F. de Almeida Freitas, “Gesture unit segmentation using spatial-temporal information and machine learning,” in The Twenty-Seventh International Flairs Conference, 2014.
- BlastChar, “Telco customer churn,” accessed: 2022-07-26. [Online]. Available: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
- “Rossmann store sales kaggle competition: forecast sales using store, promotion, and competitor data,” accessed: 2022-07-26. [Online]. Available: https://www.kaggle.com/competitions/rossmann-store-sales/overview
- A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “The Structural Simulation Toolkit,” SIGMETRICS Perform. Eval. Rev., vol. 38, no. 4, pp. 37–42, Mar. 2011, place: New York, NY, USA Publisher: Association for Computing Machinery.
- N. Mohan, W. Fung, D. Wright, and M. Sachdev, “Design techniques and test methodology for low-power tcams,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 6, pp. 573–586, 2006.
- R. M. Roth, “On the implementation of boolean functions on content-addressable memories,” in 2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 1408–1413.
- Y. Tsukamoto, M. Morimoto, M. Yabuuchi, M. Tanaka, and K. Nii, “1.8 mbit/mm2 ternary-cam macro with 484 ps search access time in 16 nm fin-fet bulk cmos technology,” in 2015 Symposium on VLSI Circuits (VLSI Circuits), 2015, pp. C274–C275.
- R. Shwartz-Ziv and A. Armon, “Tabular data: Deep learning is not all you need,” Information Fusion, vol. 81, pp. 84–90, 2022.
- L. Zhao, Q. Deng, Y. Zhang, and J. Yang, “Rfacc: A 3d reram associative array based random forest accelerator,” in Proceedings of the ACM International Conference on Supercomputing, ser. ICS ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 473–483.
- C.-H. Lin, K. L. J. Wong, T.-Y. Kim, G. R. Xie, D. Major, G. Unruh, S. R. Dommaraju, H. Eberhart, and A. Venes, “A 16b 6gs/s nyquist dac with imd ¡-90dbc up to 1.9ghz in 16nm cmos,” in 2018 IEEE International Solid - State Circuits Conference - (ISSCC), 2018, pp. 360–362.