Data-Oblivious ML Accelerators using Hardware Security Extensions (2401.16583v1)
Abstract: Outsourced computation can put client data confidentiality at risk. Existing solutions are either inefficient or insufficiently secure: cryptographic techniques like fully-homomorphic encryption incur significant overheads, even with hardware assistance, while the complexity of hardware-assisted trusted execution environments has been exploited to leak secret data. Recent proposals such as BliMe and OISA show how dynamic information flow tracking (DIFT) enforced in hardware can protect client data efficiently. They are designed to protect CPU-only workloads. However, many outsourced computing applications, like machine learning, make extensive use of accelerators. We address this gap with Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator, efficiently guaranteeing client data confidentiality, even in the presence of malicious/vulnerable software and side channel attacks on the server. We show that accelerators can allow DIFT logic optimizations that significantly reduce area overhead compared with general-purpose processor architectures. Dolma is integrated with the BliMe framework to achieve end-to-end security guarantees. We evaluate Dolma on an FPGA using a ResNet-50 DNN model and show that it incurs low overheads for large configurations ($4.4\%$, $16.7\%$, $16.5\%$ for performance, resource usage and power, respectively, with a 32x32 configuration).
- N. Samardzic, A. Feldmann, A. Krastev, N. Manohar, N. Genise, S. Devadas, K. Eldefrawy, C. Peikert, and D. Sanchez, “CraterLake: A hardware accelerator for efficient unbounded computation on encrypted data,” in Proceedings of the International Symposium on Computer Architecture, 2022, pp. 173–187.
- V. Costan and S. Devadas, “Intel SGX explained,” Cryptology ePrint Archive, Report 2016/086, 2016.
- S. Pinto and N. Santos, “Demystifying Arm TrustZone: A comprehensive survey,” ACM Computing Surveys, vol. 51, no. 6, pp. 1–36, 2019.
- NVIDIA. (2023) Confidential Compute on NVIDIA Hopper H100. [Online]. Available: https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/HCC-Whitepaper-v1.0.pdf
- J. Yu, L. Hsiung, M. El’Hajj, and C. W. Fletcher, “Data oblivious ISA extensions for side channel-resistant and high performance computing,” in Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, 2019.
- H. ElAtali, L. J. Gunn, H. Liljestrand, and N. Asokan, “BliMe: Verifiably secure outsourced computation with hardware-enforced taint tracking,” in Proceedings of the Network and Distributed System Security Symposium, 2024.
- J. Zhao, B. Korpan, A. Gonzalez, and K. Asanovic, “SonicBOOM: The 3rd generation berkeley out-of-order machine,” Proceedings of the Workshop on Computer Architecture Research with RISC-V, 2020. [Online]. Available: https://carrv.github.io/2020/papers/CARRV2020_paper_15_Zhao.pdf
- M. Tiwari, H. M. Wassel, B. Mazloom, S. Mysore, F. T. Chong, and T. Sherwood, “Complete information flow tracking from the gates up,” in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 2009, pp. 109–120.
- H. Genc, S. Kim, A. Amid, A. Haj-Ali, V. Iyer, P. Prakash, J. Zhao, D. Grubb, H. Liew, H. Mao, A. Ou, C. Schmidt, S. Steffl, J. Wright, I. Stoica, J. Ragan-Kelley, K. Asanovic, B. Nikolic, and Y. S. Shao, “Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration,” in Proceedings of the 58th Annual Design Automation Conference, 2021.
- A. Amid, D. Biancolin, A. Gonzalez, D. Grubb, S. Karandikar, H. Liew, A. Magyar, H. Mao, A. Ou, N. Pemberton, P. Rigge, C. Schmidt, J. Wright, J. Zhao, Y. S. Shao, K. Asanović, and B. Nikolić, “Chipyard: Integrated design, simulation, and implementation framework for custom SoCs,” IEEE Micro, vol. 40, no. 4, pp. 10–21, 2020.
- P. C. Kocher, “Timing attacks on implementations of diffie-hellman, rsa, dss, and other systems,” in Advances in Cryptology—CRYPTO’96: 16th Annual International Cryptology Conference Santa Barbara, California, USA August 18–22, 1996 Proceedings 16. Springer, 1996, pp. 104–113.
- D. A. Osvik, A. Shamir, and E. Tromer, “Cache attacks and countermeasures: The case of AES,” in Proceedings of the Cryptographers’ Track at the RSA Conference on Topics in Cryptology, Berlin, Heidelberg, 2006, pp. 1–20.
- J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović, “Chisel: Constructing hardware in a Scala embedded language,” in Proceedings of the Design Automation Conference, New York, NY, USA, 2012, pp. 1216–1225.
- N. Swamy, C. Hriţcu, C. Keller, A. Rastogi, A. Delignat-Lavaud, S. Forest, K. Bhargavan, C. Fournet, P.-Y. Strub, M. Kohlweiss, J.-K. Zinzindohoue, and S. Zanella-Béguelin, “Dependent types and multi-monadic effects in F*,” in Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, New York, NY, USA, 2016, pp. 256–270.
- V. Sze, Y. hsin Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, pp. 2295–2329, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:3273340
- N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the Annual International Symposium on Computer Architecture, ser. ISCA ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 1–12. [Online]. Available: https://doi.org/10.1145/3079856.3080246
- NVIDIA. (2023) NVIDIA H100 tensor core gpu architecture. [Online]. Available: https://resources.nvidia.com/en-us-tensor-core
- W. Hu, A. Ardeshiricham, and R. Kastner, “Hardware information flow tracking,” ACM computing surveys, vol. 54, no. 4, pp. 1–39, 2021.
- J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic, “Hardbound: Architectural support for spatial safety of the C programming language,” in Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008, pp. 103–114. [Online]. Available: https://doi.org/10.1145/1346281.1346295
- U. Dhawan, N. Vasilakis, R. Rubin, S. Chiricescu, J. M. Smith, T. F. Knight, B. C. Pierce, and A. DeHon, “PUMP: A programmable unit for metadata processing,” in Proceedings of the 3rd Workshop on Hardware and Architectural Support for Security and Privacy, 2014, pp. 1–8. [Online]. Available: https://doi.org/10.1145/2611765.2611773
- M. Dalton, H. Kannan, and C. Kozyrakis, “Raksha: A flexible information flow architecture for software security,” in Proceedings of the Annual International Symposium on Computer Architecture, ser. ISCA ’07. New York, NY, USA: Association for Computing Machinery, 2007, p. 482–493. [Online]. Available: https://doi.org/10.1145/1250662.1250722
- A. Ferraiuolo, M. Zhao, A. C. Myers, and G. E. Suh, “HyperFlow: A Processor Architecture for Nonmalleable, Timing-Safe Information Flow Security,” in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018, pp. 1583–1600. [Online]. Available: https://doi.org/10.1145/3243734.3243743