From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks (2405.06038v1)
Abstract: Deep neural networks (DNNs) have been widely used in many AI tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compression methods to achieve model efficiency while retaining the performance. Furthermore, more and more works focus on customizing the DNN hardware accelerators to better leverage the model compression techniques. In addition to efficiency, preserving security and privacy is critical for deploying DNNs. However, the vast and diverse body of related works can be overwhelming. This inspires us to conduct a comprehensive survey on recent research toward the goal of high-performance, cost-efficient, and safe deployment of DNNs. Our survey first covers the mainstream model compression techniques such as model quantization, model pruning, knowledge distillation, and optimizations of non-linear operations. We then introduce recent advances in designing hardware accelerators that can adapt to efficient model compression approaches. Additionally, we discuss how homomorphic encryption can be integrated to secure DNN deployment. Finally, we discuss several issues, such as hardware evaluation, generalization, and integration of various compression approaches. Overall, we aim to provide a big picture of efficient DNNs, from algorithm to hardware accelerators and security perspectives.
- H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022.
- V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, 2017.
- B. Varghese, N. Wang, S. Barbhuiya, P. Kilpatrick, and D. S. Nikolopoulos, “Challenges and opportunities in edge computing,” in IEEE SmartCloud, 2016.
- L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, “Model compression and hardware acceleration for neural networks: A comprehensive survey,” Proceedings of the IEEE, 2020.
- R. Mishra and H. Gupta, “Transforming large-size to lightweight deep neural networks for iot applications,” ACM Computing Surveys, 2023.
- Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “Model compression and acceleration for deep neural networks: The principles, progress, and challenges,” IEEE Signal Processing Magazine, 2018.
- R. Mishra, H. P. Gupta, and T. Dutta, “A survey on deep neural network compression: Challenges, overview, and solutions,” arXiv preprint arXiv:2010.03954, 2020.
- J. O. Neill, “An overview of neural network compression,” arXiv preprint arXiv:2006.03669, 2020.
- A. Alqahtani, X. Xie, and M. W. Jones, “Literature review of deep network compression,” in Informatics. Multidisciplinary Digital Publishing Institute, 2021.
- A. Berthelier, T. Chateau, S. Duffner, C. Garcia, and C. Blanc, “Deep model compression and architecture optimization for embedded systems: A survey,” Journal of Signal Processing Systems, 2021.
- S. Mittal and S. Umesh, “A survey on hardware accelerators and optimization techniques for rnns,” Journal of Systems Architecture, 2021.
- K. Nan, S. Liu, J. Du, and H. Liu, “Deep model compression for mobile platforms: A survey,” Tsinghua Science and Technology, 2019.
- C. Xu and J. McAuley, “A survey on model compression for natural language processing,” arXiv preprint arXiv:2202.07105, 2022.
- M. Gupta and P. Agrawal, “Compression of deep learning models for text: A survey,” TKDD, 2022.
- Y. Guo, “A survey on methods and theories of quantized neural networks,” arXiv preprint arXiv:1808.04752, 2018.
- A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, “A survey of quantization methods for efficient neural network inference,” arXiv preprint arXiv:2103.13630, 2021.
- S. Xu, A. Huang, L. Chen, and B. Zhang, “Convolutional neural network pruning: A survey,” in CCC, 2020.
- J. Liu, S. Tripathi, U. Kurup, and M. Shah, “Pruning algorithms to accelerate convolutional neural networks for edge applications: A survey,” arXiv preprint arXiv:2005.04275, 2020.
- R. Reed, “Pruning algorithms-a survey,” IEEE TNNLS, 1993.
- T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” JMLR, 2019.
- P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, and X. Wang, “A comprehensive survey of neural architecture search: Challenges and solutions,” CSUR, 2021.
- M. Wistuba, A. Rawat, and T. Pedapati, “A survey on neural architecture search,” arXiv preprint arXiv:1905.01392, 2019.
- J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” IJCV, 2021.
- L. Wang and K.-J. Yoon, “Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks,” T-PAMI, 2021.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in ICLR, 2016.
- D. Zhang, J. Yang, D. Ye, and G. Hua, “Lq-nets: learned quantization for highly accurate and compact deep neural networks,” in ECCV, 2018.
- A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network quantization: Towards lossless cnns with low-precision weights,” in arXiv preprint arXiv:1702.03044, 2017.
- S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” in arXiv preprint arXiv:1606.06160, 2016.
- F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” in NIPS Workshop, 2016.
- Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks using vector quantization,” in ICLR, 2015.
- W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen, “Compressing neural networks with the hashing trick,” in ICML, 2015.
- W. Chen, J. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen, “Compressing convolutional neural networks in the frequency domain,” in SIGKDD, 2016.
- P. Stock, A. Joulin, R. Gribonval, B. Graham, and H. Jegou, “And the bit goes down: revisiting the quantization of neural networks,” in ICLR, 2020.
- Y. Li, X. Dong, and W. Wang, “Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks,” in ICLR, 2020.
- J. Faraone, N. Fraser, M. Blott, and P. H. Leong, “Syq: Learning symmetric quantization for efficient deep neural networks,” in arXiv, 2018.
- A. T. Elthakeb, P. Pilligundla, F. Mireshghallah, A. Yazdanbakhsh, and H. Esmaeilzadeh, “Releq: A reinforcement learning approach for deep quantization of neural networks,” in NeurIPS Workshop on ML for Systems, 2018.
- K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “HAQ: Hardware-aware automated quantization with mixed precision,” in CVPR, 2019.
- B. Wu, Y. Wang, P. Zhang, Y. Tian, P. Vajda, and K. Keutzer, “Mixed precision quantization of convnets via differentiable neural architecture search,” in ICLR, 2019.
- S. Uhlich, L. Mauch, F. Cardinaux, K. Yoshiyama, J. A. Garcia, S. Tiedemann, T. Kemp, and A. Nakamura, “Mixed precision dnns: All you need is a good parametrization,” arXiv preprint arXiv:1905.11452, 2019.
- Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, “Hawq: Hessian aware quantization of neural networks with mixed-precision,” in arXiv, 2019.
- Z. Qu, Z. Zhou, Y. Cheng, and L. Thiele, “Adaptive loss-aware quantization for multi-bit networks,” in arXiv, 2020.
- R. Banner, Y. Nahshan, E. Hoffer, and D. Soudry, “Post-training 4-bit quantization of convolution networks for rapid-deployment,” arXiv preprint arXiv:1810.05723, 2018.
- L. Yang and Q. Jin, “Fracbits: Mixed precision quantization via fractional bit-widths,” arXiv preprint arXiv:2007.02017, 2020.
- S. Zhao, T. Yue, and X. Hu, “Distribution-aware adaptive multi-bit quantization,” in CVPR, 2021.
- Q. Lou, F. Guo, M. Kim, L. Liu, and L. Jiang, “Autoq: Automated kernel-wise neural network quantizations,” in ICLR, 2020.
- A. Dubey, M. Chatterjee, and N. Ahuja, “Coreset-based neural network compression,” in arXiv:1807.09810, 2018.
- S. W. et al., “Deepcabac: Context-adaptive binary arithmetic coding for deep neural network compression,” in ICML Workshop, 2019.
- D. Oktay, J. Ballé, S. Singh, and A. Shrivastava, “Scalable model compression by entropy penalized reparameterization,” in ICLR, 2020.
- W. Zhe, J. Lin, M. S. Aly, S. Young, V. Chandrasekhar, and B. Girod, “Rate-distortion optimized coding for efficient cnn compression,” in DCC, 2021.
- S. Khoram and J. Li, “Adaptive quantization of neural networks,” in ICLR, 2018.
- S. Young, Z. Wang, D. Taubman, and B. Girod, “Transform quantization for cnn compression,” T-PAMI, 2021.
- Y. Wang, C. Xu, S. You, D. Tao, and C. Xu, “Cnnpack: packing convolutional neural networks in the frequency domain,” in NIPS, 2016.
- E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting linear structure within convolutional networks for efficient evaluation,” in NIPS, 2014.
- X. Zhang, J. Zou, X. Ming, K. He, and J. Sun, “Efficient and accurate approximations of nonlinear convolutional networks,” CVPR, 2015.
- Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin, “Compression of deep convolutional neural networks for fast and low power mobile applications,” in ICLR, 2016.
- Y. Li, S. Gu, L. V. Gool, and R. Timofte, “Learning filter basis for convolutional neural network compression,” ICCV, 2019.
- C. Li, T. Tang, G. Wang, J. Peng, B. Wang, X. Liang, and X. Chang, “Bossnas: Exploring hybrid cnn-transformers with block-wisely self-supervised neural architecture search,” in ICCV, 2021.
- M. Chen, H. Peng, J. Fu, and H. Ling, “Autoformer: Searching transformers for visual recognition,” in ICCV, 2021.
- Y. Bondarenko, M. Nagel, and T. Blankevoort, “Understanding and overcoming the challenges of efficient transformer quantization,” in EMNLP, 2021.
- T. Dettmers, M. Lewis, Y. Belkada, and L. Zettlemoyer, “Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale,” BIPS, 2022.
- Z. Liu, Y. Wang, K. Han, W. Zhang, S. Ma, and W. Gao, “Post-training quantization for vision transformer,” NIPS, 2021.
- Y. Ding, H. Qin, Q. Yan, Z. Chai, J. Liu, X. Wei, and X. Liu, “Towards accurate post-training quantization for vision transformer,” in ACM MM, 2022.
- D. S. Taubman and M. W. Marcellin, “Jpeg2000 image compression fundamentals, standards and practice,” Journal of Electronic Imaging, 2002.
- B. P. Tunstall, “Synthesis of noiseless compression codes,” in PhD diss., Georgia Institute of Technology, 1967.
- N. Lee, T. Ajanthan, and P. H. Torr, “Snip: Single-shot network pruning based on connection sensitivity,” in ICLR, 2018.
- C. Wang, G. Zhang, and R. Grosse, “Picking winning tickets before training by preserving gradient flow,” in ICLR, 2020.
- P. de Jorge, A. Sanyal, H. Behl, P. Torr, G. Rogez, and P. K. Dokania, “Progressive skeletonization: Trimming more fat from a network at initialization,” in ICLR, 2021.
- H. Wang, C. Qin, Y. Bai, Y. Zhang, and Y. Fu, “Recent advances on neural network pruning at initialization,” in IJCAI, 2022.
- Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in NIPS, 1990.
- S. P. Singh and D. Alistarh, “Woodfisher: Efficient second-order approximation for neural network compression,” in NIPS, 2020.
- E. Frantar, E. Kurtic, and D. Alistarh, “M-FAC: Efficient matrix-free approximations of second-order information,” in NIPS, 2021.
- M. Shen, H. Yin, P. Molchanov, L. Mao, J. Liu, and J. M. Alvarez, “Structural pruning via latency-saliency knapsack,” NIPS, 2022.
- P. Molchanov, A. Mallya, S. Tyree, I. Frosio, and J. Kautz, “Importance estimation for neural network pruning,” in CVPR, 2019.
- K. Xu, Z. Wang, X. Geng, M. Wu, X. Li, and W. Lin, “Efficient joint optimization of layer-adaptive weight pruning in deep neural networks,” in ICCV, 2023.
- T. Lin, S. U. Stich, L. Barba, D. Dmitriev, and M. Jaggi, “Dynamic model pruning with feedback,” in ICLR, 2020.
- J. Liu, Z. XU, R. SHI, R. C. C. Cheung, and H. K. So, “Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers,” in ICLR, 2020.
- P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” in ICLR, 2017.
- Y. Tang, Y. Wang, Y. Xu, D. Tao, C. Xu, C. Xu, and C. Xu, “Scop: Scientific control for reliable neural network pruning,” NIPS, 2020.
- A. Kusupati, V. Ramanujan, R. Somani, M. Wortsman, P. Jain, S. Kakade, and A. Farhadi, “Soft threshold weight reparameterization for learnable sparsity,” in ICML, 2020.
- P. Savarese, H. Silva, and M. Maire, “Winning the lottery with continuous sparsification,” in NIPS, 2020.
- H. Wang, C. Qin, Y. Zhang, and Y. Fu, “Neural pruning via growing regularization,” in ICLR, 2021.
- U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” in ICML, 2020.
- S. Park, J. Lee, S. Mo, and J. Shin, “Lookahead: a far-sighted alternative of magnitude-based pruning,” in ICLR, 2020.
- M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression,” in ICLR Workshop, vol. abs/1710.01878, 2018.
- J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-adaptive sparsity for the magnitude-based pruning,” in ICLR, 2021.
- M. Gupta, E. Camci, V. R. Keneta, A. Vaidyanathan, R. Kanodia, C.-S. Foo, W. Min, and L. Jie, “Is complexity required for neural network pruning? a case study on global magnitude pruning,” arXiv preprint arXiv:2209.14624, 2022.
- H. Mostafa and X. Wang, “Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization,” in ICML, 2019.
- T. Dettmers and L. Zettlemoyer, “Sparse networks from scratch: Faster training without losing performance,” CoRR, vol. abs/1907.04840, 2019.
- M. Gupta, S. Aravindan, A. Kalisz, V. Chandrasekhar, and L. Jie, “Learning to prune deep neural networks via reinforcement learning,” ICML AutoML Workshop, 2020.
- He, Yihui, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in ECCV, 2018.
- J. Lin, Y. Rao, J. Lu, and J. Zhou, “Runtime neural pruning,” in NIPS, 2017.
- J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in ICLR, 2019.
- R. Banner, Y. Nahshan, and D. Soudry, “Post training 4-bit quantization of convolutional networks for rapid-deployment,” NIPS, 2019.
- E. Frantar and D. Alistarh, “Optimal brain compression: A framework for accurate post-training quantization and pruning,” NIPS, 2022.
- W. Kwon, S. Kim, M. W. Mahoney, J. Hassoun, K. Keutzer, and A. Gholami, “A fast post-training pruning framework for transformers,” NIPS, 2022.
- M. Wortsman, A. Farhadi, and M. Rastegari, “Discovering neural wirings,” in NIPS, 2019.
- Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, “Filter pruning via geometric median for deep convolutional neural networks acceleration,” in CVPR, 2019.
- M. Lin, R. Ji, Y. Wang, Y. Zhang, B. Zhang, Y. Tian, and L. Shao, “Hrank: Filter pruning using high-rank feature map,” in CVPR, 2020.
- Y. Wang, X. Zhang, L. Xie, J. Zhou, H. Su, B. Zhang, and X. Hu, “Pruning from scratch,” in AAAI, 2020.
- Y. Li, S. Gu, K. Zhang, L. Van Gool, and R. Timofte, “Dhp: Differentiable meta pruning via hypernetworks,” in ECCV, 2020.
- S. Lin, R. Ji, C. Yan, B. Zhang, L. Cao, Q. Ye, F. Huang, and D. Doermann, “Towards optimal structured cnn pruning via generative adversarial learning,” in CVPR, 2019.
- W. Wang, M. Chen, S. Zhao, L. Chen, J. Hu, H. Liu, D. Cai, X. He, and W. Liu, “Accelerate cnns from three dimensions: A comprehensive pruning framework,” in ICML, 2021.
- F. Lagunas, E. Charlaix, V. Sanh, and A. M. Rush, “Block pruning for faster transformers,” in EMNLP, 2021.
- A. Zhou, Y. Ma, J. Zhu, J. Liu, Z. Zhang, K. Yuan, W. Sun, and H. Li, “Learning n: m fine-grained structured sparse neural networks from scratch,” arXiv preprint arXiv:2102.04010, 2021.
- C. Fang, A. Zhou, and Z. Wang, “An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers,” IEEE T-VLSI, 2022.
- H. Yang, H. Yin, M. Shen, P. Molchanov, H. Li, and J. Kautz, “Global vision transformer pruning with hessian-aware saliency,” in CVPR, 2023.
- C. Yu, T. Chen, Z. Gan, and J. Fan, “Boost vision transformer with gpu-friendly sparsity and quantization,” in CVPR, 2023.
- H. Cheng, M. Zhang, and J. Q. Shi, “A survey on deep neural network pruning-taxonomy, comparison, analysis, and recommendations,” arXiv preprint arXiv:2308.06767, 2023.
- J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, “Slimmable neural networks,” in ICLR, 2018.
- J. Yu and T. S. Huang, “Universally slimmable networks and improved training techniques,” in ICCV, 2019.
- C. Li, G. Wang, B. Wang, X. Liang, Z. Li, and X. Chang, “Dynamic slimmable network,” in CVPR, 2021.
- H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once-for-all: Train one network and specialize it for efficient deployment,” ICLR, 2020.
- C. Wang, X. Lan, and Y. Zhang, “Model distillation with knowledge transfer from face classification to alignment and verification,” arXiv preprint arXiv:1709.02929, 2017.
- H. J. Lee, W. J. Baddar, H. G. Kim, S. T. Kim, and Y. M. Ro, “Teacher and student joint learning for compact facial landmark detection network,” in MMM, 2018.
- Y. Zhu and Y. Wang, “Student customized knowledge distillation: Bridging the gap between student and teacher,” in ICCV, 2021.
- D. Y. Park, M.-H. Cha, D. Kim, B. Han et al., “Learning student-friendly teacher networks for knowledge distillation,” in NIPS, 2021.
- Y. Kim and A. M. Rush, “Sequence-level knowledge distillation,” arXiv preprint arXiv:1606.07947, 2016.
- M. A. Gordon and K. Duh, “Explaining sequence-level knowledge distillation as data-augmentation for neural machine translation,” arXiv preprint arXiv:1912.03334, 2019.
- Z.-R. Wang and J. Du, “Joint architecture and knowledge distillation in cnn for chinese text recognition,” Pattern Recognition, 2021.
- Y. Chebotar and A. Waters, “Distilling knowledge from ensembles of neural networks for speech recognition.” in Interspeech, 2016.
- K. Kwon, H. Na, H. Lee, and N. S. Kim, “Adaptive knowledge distillation based on entropy,” in ICASSP, 2020.
- Z. Li, Y. Ming, L. Yang, and J.-H. Xue, “Mutual-learning sequence-level knowledge distillation for automatic speech recognition,” Neurocomputing, 2021.
- Q. Xu, Z. Chen, K. Wu, C. Wang, M. Wu, and X. Li, “Kdnet-rul: A knowledge distillation framework to compress deep neural networks for machine remaining useful life prediction,” IEEE Transactions on Industrial Electronics, 2021.
- Q. Xu, Z. Chen, M. Ragab, C. Wang, M. Wu, and X. Li, “Contrastive adversarial knowledge distillation for deep model compression in time-series regression tasks,” Neurocomputing, 2022.
- G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
- C. Zhang and Y. Peng, “Better and faster: knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification,” in IJCAI, 2018.
- B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang, “Decoupled knowledge distillation,” in CVPR, 2022.
- G. Chen, W. Choi, X. Yu, T. Han, and M. Chandraker, “Learning efficient object detection models with knowledge distillation,” in NIPS, 2017.
- C. Yuan and R. Pan, “Obtain dark knowledge via extended knowledge distillation,” in AIAM, 2019.
- S. Hegde, R. Prasad, R. Hebbalaguppe, and V. Kumar, “Variational student: Learning compact and sparser networks in knowledge distillation framework,” in ICASSP, 2020.
- M. R. U. Saputra, P. P. De Gusmao, Y. Almalioglu, A. Markham, and N. Trigoni, “Distilling knowledge from a deep pose regressor network,” in ICCV, 2019.
- S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” in ICLR, 2017.
- J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in CVPR, 2017.
- J. Kim, S. Park, and N. Kwak, “Paraphrasing complex network: Network compression via factor transfer,” in NIPS, 2018.
- S. Ahn, S. X. Hu, A. Damianou, N. D. Lawrence, and Z. Dai, “Variational information distillation for knowledge transfer,” in CVPR, 2019.
- L. Liu, Q. Huang, S. Lin, H. Xie, B. Wang, X. Chang, and X. Liang, “Exploring inter-channel correlation for diversity-preserved knowledge distillation,” in ICCV, 2021.
- S. Lin, H. Xie, B. Wang, K. Yu, X. Chang, X. Liang, and G. Wang, “Knowledge distillation via the target-aware transformer,” in CVPR, 2022.
- Y. Shang, B. Duan, Z. Zong, L. Nie, and Y. Yan, “Lipschitz continuity guided knowledge distillation,” in ICCV, 2021.
- Z. Huang, X. Shen, J. Xing, T. Liu, X. Tian, H. Li, B. Deng, J. Huang, and X.-S. Hua, “Revisiting knowledge distillation: An inheritance and exploration framework,” in CVPR, 2021.
- M. Ji, B. Heo, and S. Park, “Show, attend and distill: Knowledge distillation via attention-based feature matching,” in AAAI, 2021.
- N. Passalis and A. Tefas, “Learning deep representations with probabilistic knowledge transfer,” in ECCV, 2018.
- W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in CVPR, 2019.
- Y. Liu, J. Cao, B. Li, C. Yuan, W. Hu, Y. Li, and Y. Duan, “Knowledge distillation via instance relationship graph,” in CVPR, 2019.
- T. Huang, S. You, F. Wang, C. Qian, and C. Xu, “Knowledge distillation from a stronger teacher,” in NIPS, 2022.
- S. Yun, J. Park, K. Lee, and J. Shin, “Regularizing class-wise predictions via self-knowledge distillation,” in CVPR, 2020.
- Y. Tian, D. Krishnan, and P. Isola, “Contrastive representation distillation,” in ICLR, 2020.
- J. Zhu, S. Tang, D. Chen, S. Yu, Y. Liu, M. Rong, A. Yang, and X. Wang, “Complementary relation contrastive distillation,” in CVPR, 2021.
- B. Peng, X. Jin, J. Liu, D. Li, Y. Wu, Y. Liu, S. Zhou, and Z. Zhang, “Correlation congruence for knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5007–5016.
- X. Geng, J. Lin, B. Zhao, A. Kong, M. M. S. Aly, and V. Chandrasekhar, “Hardware-aware softmax approximation for deep neural networks,” in ACCV, 2018.
- N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in CVPR, 2005.
- N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-nms–improving object detection with one line of code,” in ICCV, 2017.
- L. Tychsen-Smith and L. Petersson, “Improving object localization with fitness nms and bounded iou loss,” in CVPR, 2018.
- B. Jiang, R. Luo, J. Mao, T. Xiao, and Y. Jiang, “Acquisition of localization confidence for accurate object detection,” in ECCV, 2018.
- N. Gählert, N. Hanselmann, U. Franke, and J. Denzler, “Visibility guided nms: Efficient boosting of amodal object detection in crowded traffic scenes,” arXiv preprint arXiv:2006.08547, 2020.
- Y. He, C. Zhu, J. Wang, M. Savvides, and X. Zhang, “Bounding box regression with uncertainty for accurate object detection,” in CVPR, 2019.
- S. Liu, D. Huang, and Y. Wang, “Adaptive nms: Refining pedestrian detection in a crowd,” in CVPR, 2019.
- N. O. Salscheider, “Featurenms: Non-maximum suppression by learning feature embeddings,” in ICPR, 2021.
- J. Hosang, R. Benenson, and B. Schiele, “A convnet for non-maximum suppression,” in German Conference on Pattern Recognition, 2016.
- J. Hosang, R. Benenson, and B. Schiele, “Learning non-maximum suppression,” in CVPR, 2017.
- H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, “Relation networks for object detection,” in CVPR, 2018.
- R. Rothe, M. Guillaumin, and L. V. Gool, “Non-maximum suppression for object detection by passing messages between windows,” in ACCV, 2014.
- D. Oro et al., “Work-Efficient Parallel Non-Maximum Suppression Kernels,” The Computer Journal, 08 2020.
- C. Chen, T. Zhang, Z. Yu, A. Raghuraman, S. Udayan, J. Lin, and M. M. S. Aly, “Scalable hardware acceleration of non-maximum suppression,” in DATE, 2022.
- L. Cai et al., “Maxpoolnms: getting rid of nms bottlenecks in two-stage object detectors,” in CVPR, 2019.
- T. Zhang et al., “Psrr-maxpoolnms: Pyramid shifted maxpoolnms with relationship recovery,” in CVPR, 2021.
- A. Joulin, M. Cissé, D. Grangier, H. Jégou et al., “Efficient softmax approximation for gpus,” in ICML, 2017.
- K. Shim, M. Lee, I. Choi, Y. Boo, and W. Sung, “Svd-softmax: Fast softmax approximation on large vocabulary neural networks,” in NIPS, 2017.
- G. Blanc and S. Rendle, “Adaptive sampled softmax with kernel based sampling,” in ICML, 2018.
- J. Lu, J. Yao, J. Zhang, X. Zhu, H. Xu, W. Gao, C. Xu, T. Xiang, and L. Zhang, “Soft: Softmax-free transformer with linear complexity,” in NIPS, 2021.
- K. Zhao, L. Song, Y. Zhang, P. Pan, Y. Xu, and R. Jin, “Ann softmax: acceleration of extreme classification training,” Proceedings of the VLDB Endowment, 2021.
- P. H. Chen, S. Si, S. Kumar, Y. Li, and C.-J. Hsieh, “Learning to screen for fast softmax inference on large vocabulary neural networks,” arXiv preprint arXiv:1810.12406, 2018.
- S. Liao, T. Chen, T. Lin, D. Zhou, and C. Wang, “Doubly sparse: Sparse mixture of sparse experts for efficient softmax inference,” arXiv preprint arXiv:1901.10668, 2019.
- A. S. Rawat, J. Chen, F. X. X. Yu, A. T. Suresh, and S. Kumar, “Sampled softmax with random fourier features,” in NIPS, 2019.
- J. Andreas, M. Rabinovich, M. I. Jordan, and D. Klein, “On the accuracy of self-normalized log-linear models,” NIPS, 2015.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in ICCV, 2015.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in ICML, 2019.
- H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target task and hardware,” arXiv preprint arXiv:1812.00332, 2018.
- M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in CVPR, 2019.
- B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” in CVPR, 2019.
- H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” ICLR, 2019.
- H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient neural architecture search via parameters sharing,” in ICML, 2018.
- A. Ancilotto, F. Paissan, and E. Farella, “Xinet: Efficient neural networks for tinyml,” in ICCV, 2023.
- J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han, “Mcunetv2: Memory-efficient patch-based inference for tiny deep learning,” arXiv preprint arXiv:2110.15352, 2021.
- H. Cai, C. Gan, L. Zhu, and S. Han, “Tinytl: Reduce memory, not parameters for efficient on-device learning,” NIPS, 2020.
- T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” ACM SIGARCH Computer Architecture News, 2014.
- Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., “Dadiannao: A machine-learning supercomputer,” in MICRO, 2014.
- D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, “Pudiannao: A polyvalent machine learning accelerator,” ACM SIGARCH Computer Architecture News, 2015.
- Nvidia. (2018) NVIDIA Deep Learning Accelerator. [Online]. Available: http://nvdla.org/primer.html
- H. Kwon, A. Samajdar, and T. Krishna, “Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects,” ACM SIGPLAN Notices, 2018.
- V. Strassen, “Gaussian elimination is not optimal,” Numerische mathematik, 1969.
- E. O. Brigham and R. Morrow, “The fast fourier transform,” IEEE spectrum, 1967.
- J. Cong and B. Xiao, “Minimizing computation in convolutional neural networks,” in ICANN, 2014.
- A. Lavin and S. Gray, “Fast algorithms for convolutional neural networks,” in CVPR, 2016.
- S. Ben-Yacoub, “Fast object detection using mlp and fft,” IDIAP, Tech. Rep., 1997.
- M. Mathieu, M. Henaff, and Y. LeCun, “Fast training of convolutional networks through ffts,” arXiv preprint arXiv:1312.5851, 2013.
- N. Vasilache, J. Johnson, M. Mathieu, S. Chintala, S. Piantino, and Y. LeCun, “Fast convolutional nets with fbfft: A gpu performance evaluation,” arXiv preprint arXiv:1412.7580, 2014.
- Y. Liang, L. Lu, Q. Xiao, and S. Yan, “Evaluating fast algorithms for convolutional neural networks on fpgas,” IEEE TCAD, 2019.
- S. Lu, M. Wang, S. Liang, J. Lin, and Z. Wang, “Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer,” in SOCC, 2020.
- T. J. Ham, S. J. Jung, S. Kim, Y. H. Oh, Y. Park, Y. Song, J.-H. Park, S. Lee, K. Park, J. W. Lee et al., “A^ 3: Accelerating attention mechanisms in neural networks with approximation,” in HPCA, 2020.
- W. Hu, D. Xu, Z. Fan, F. Liu, and Y. He, “Vis-top: Visual transformer overlay processor,” arXiv preprint arXiv:2110.10957, 2021.
- M. Sun, H. Ma, G. Kang, Y. Jiang, T. Chen, X. Ma, Z. Wang, and Y. Wang, “Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer,” arXiv preprint arXiv:2201.06618, 2022.
- J. Yu, J. Park, S. Park, M. Kim, S. Lee, D. H. Lee, and J. Choi, “Nn-lut: neural approximation of non-linear operations for efficient transformer inference,” in DAC, 2022.
- A. Marchisio, D. Dura, M. Capra, M. Martina, G. Masera, and M. Shafique, “Swifttron: An efficient hardware accelerator for quantized transformers,” arXiv preprint arXiv:2304.03986, 2023.
- S. Nag, G. Datta, S. Kundu, N. Chandrachoodan, and P. A. Beerel, “Vita: A vision transformer inference accelerator for edge applications,” arXiv preprint arXiv:2302.09108, 2023.
- S. Yin, P. Ouyang, S. Tang, F. Tu, X. Li, L. Liu, and S. Wei, “A 1.06-to-5.09 tops/w reconfigurable hybrid-neural-network processor for deep learning applications,” in Symposium on VLSI Circuits, 2017.
- H. Zhang, D. Chen, and S.-B. Ko, “New flexible multiple-precision multiply-accumulate unit for deep neural network training and inference,” IEEE Transactions on Computers, 2019.
- J. Lee, J. Lee, D. Han, J. Lee, G. Park, and H.-J. Yoo, “7.7 lnpu: A 25.3 tflops/w sparse deep-neural-network learning processor with fine-grained mixed precision of fp8-fp16,” in ISSCC. IEEE, 2019.
- X. Zhou, L. Zhang, C. Guo, X. Yin, and C. Zhuo, “A convolutional neural network accelerator architecture with fine-granular mixed precision configurability,” in ISCAS, 2020.
- B. Fleischer, S. Shukla, M. Ziegler, J. Silberman, J. Oh, V. Srinivasan, J. Choi, S. Mueller, A. Agrawal, T. Babinsky et al., “A scalable multi-teraops deep learning processor core for ai trainina and inference,” in IEEE Symposium on VLSI Circuits, 2018.
- J. Wang, Q. Lou, X. Zhang, C. Zhu, Y. Lin, and D. Chen, “Design flow of accelerating hybrid extremely low bit-width neural network in embedded fpga,” in FPL, 2018.
- A. Agrawal, S. K. Lee, J. Silberman, M. Ziegler, M. Kang, S. Venkataramani, N. Cao, B. Fleischer, M. Guillorn, M. Cohen et al., “9.1 a 7nm 4-core ai chip with 25.6 tflops hybrid fp8 training, 102.4 tops int4 inference and workload-aware throttling,” in ISSCC, 2021.
- S. Recanatesi et al., “Dimensionality compression and expansion in deep neural networks,” arXiv preprint arXiv:1906.00443, 2019.
- D. Oktay et al., “Scalable model compression by entropy penalized reparameterization,” arXiv preprint arXiv:1906.06624, 2019.
- S. Wiedemann et al., “Deepcabac: Context-adaptive binary arithmetic coding for deep neural network compression,” arXiv preprint arXiv:1905.08318, 2019.
- A. Dubey et al., “Coreset-based neural network compression,” in ECCV, 2018.
- C. Chen, Z. Wang, X. Chen, J. Lin, and M. M. S. Aly, “Efficient tunstall decoder for deep neural network compression,” in DAC, 2021.
- M. Rabbani, “Jpeg2000: Image compression fundamentals, standards and practice,” Journal of Electronic Imaging, 2002.
- I. H. Witten et al., “Arithmetic coding for data compression,” Communications of the ACM, 1987.
- Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, 2016.
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: efficient inference engine on compressed deep neural network,” in ISCA, 2016.
- A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “Scnn: An accelerator for compressed-sparse convolutional neural networks,” in ISCA, 2017.
- Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019.
- S. Dave, R. Baghdadi, T. Nowatzki, S. Avancha, A. Shrivastava, and B. Li, “Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights,” Proceedings of the IEEE, 2021.
- S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in MICRO, 2016.
- F. G. Gustavson, “Some basic techniques for solving sparse systems of linear equations,” in Sparse matrices and their applications, 1972.
- S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis, “Splatt: Efficient and parallel sparse tensor-matrix multiplication,” in IPDPS, 2015.
- K. Hegde, H. Asghari-Moghaddam, M. Pellauer, N. Crago, A. Jaleel, E. Solomonik, J. Emer, and C. W. Fletcher, “Extensor: An accelerator for sparse tensor algebra,” in MICRO, 2019.
- Z. Yuan, J. Yue, H. Yang, Z. Wang, J. Li, Y. Yang, Q. Guo, X. Li, M.-F. Chang, H. Yang et al., “Sticker: A 0.41-62.1 tops/w 8bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers,” in IEEE symposium on VLSI circuits, 2018.
- J.-F. Zhang, C.-E. Lee, C. Liu, Y. S. Shao, S. W. Keckler, and Z. Zhang, “Snap: A 1.67—21.55 tops/w sparse neural acceleration processor for unstructured sparse deep neural network inference in 16nm cmos,” in VLSIC. IEEE, 2019.
- H.-J. Kang, “Accelerator-aware pruning for convolutional neural networks,” IEEE TCSVT, 2019.
- L. Lu, J. Xie, R. Huang, J. Zhang, W. Lin, and Y. Liang, “An efficient hardware accelerator for sparse convolutional neural networks on fpgas,” in FCCM, 2019.
- J. J. Zhang, P. Raj, S. Zarar, A. Ambardekar, and S. Garg, “Compact: On-chip compression of activations for low power systolic array based cnn acceleration,” TECS, 2019.
- X. Zhou, Z. Du, Q. Guo, S. Liu, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen, “Cambricon-s: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach,” in MICRO, 2018.
- A. K. Mishra, E. Nurvitadhi, G. Venkatesh, J. Pearce, and D. Marr, “Fine-grained accelerators for sparse machine learning workloads,” in ASP-DAC, 2017.
- S. Han, J. Kang, H. Mao, Y. Hu, X. Li, Y. Li, D. Xie, H. Luo, S. Yao, Y. Wang et al., “Ese: Efficient speech recognition engine with sparse lstm on fpga,” in ACM/SIGDA FPGA, 2017.
- B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi,” in ISSCC, 2017.
- D. Oro et al., “Work-efficient parallel non-maximum suppression for embedded gpu architectures,” in ICASSP, 2016.
- M. Shi et al., “A fast and power-efficient hardware architecture for non-maximum suppression,” TCAS-II, 2019.
- B. Yuan, “Efficient hardware architecture of softmax layer in deep neural network,” in SOCC, 2016.
- Z. Li, H. Li, X. Jiang, B. Chen, Y. Zhang, and G. Du, “Efficient fpga implementation of softmax function for dnn applications,” in ASID, 2018.
- M. Wang, S. Lu, D. Zhu, J. Lin, and Z. Wang, “A high-speed and low-complexity architecture for softmax function in deep learning,” in APCCAS. IEEE, 2018.
- Q. Sun, Z. Di, Z. Lv, F. Song, Q. Xiang, Q. Feng, Y. Fan, X. Yu, and W. Wang, “A high speed softmax vlsi architecture based on basic-split,” in ICSICT, 2018.
- R. Hu, B. Tian, S. Yin, and S. Wei, “Efficient hardware architecture of softmax layer in deep neural network,” in ICDSP, 2018.
- G. Du, C. Tian, Z. Li, D. Zhang, Y. Yin, and Y. Ouyang, “Efficient softmax hardware architecture for deep neural networks,” in Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019.
- I. Kouretas and V. Paliouras, “Hardware implementation of a softmax-like function for deep learning,” Technologies, 2020.
- K.-Y. Wang, Y.-D. Huang, Y.-L. Ho, and W.-C. Fang, “A customized convolutional neural network design using improved softmax layer for real-time human emotion recognition,” in AICAS, 2019.
- G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, A. Nannarelli, M. Re, and S. Spanò, “A pseudo-softmax function for hardware-based high speed image classification,” Scientific reports, 2021.
- F. Spagnolo, S. Perri, and P. Corsonello, “Aggressive approximation of the softmax function for power-efficient hardware implementations,” IEEE Transactions on Circuits and Systems II: Express Briefs, 2021.
- J. R. Stevens, R. Venkatesan, S. Dai, B. Khailany, and A. Raghunathan, “Softermax: Hardware/software co-design of an efficient softmax for transformers,” in DAC, 2021.
- B. R. Gaines, “Stochastic computing systems,” Advances in Information Systems Science, 1969.
- Y. Liu, S. Liu, Y. Wang, F. Lombardi, and J. Han, “A survey of stochastic computing neural networks for machine learning applications,” IEEE TNNLS, 2020.
- S. A. Salehi, “Low-cost stochastic number generators for stochastic computing,” VLSI, 2020.
- C. F. Frasser, P. Linares-Serrano, I. D. de Los Rios, A. Moran, E. S. Skibinsky-Gitlin, J. Font-Rossello, V. Canals, M. Roca, T. Serrano-Gotarredona, and J. L. Rossello, “Fully parallel stochastic computing hardware implementation of convolutional neural networks for edge computing applications,” IEEE TNNLS, 2022.
- A. Morán, L. Parrilla, M. Roca, J. Font-Rossello, E. Isern, and V. Canals, “Digital implementation of radial basis function neural networks based on stochastic computing,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2022.
- P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in Eurocrypt, 1999.
- R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Communications of the ACM, 1978.
- C. Gentry, “Fully homomorphic encryption using ideal lattices,” in STOC, 2009.
- M. Isakov, V. Gadepally, K. M. Gettings, and M. A. Kinsy, “Survey of attacks and defenses on edge-deployed neural networks,” in HPEC, 2019.
- M. Yagisawa, “Fully homomorphic encryption without bootstrapping,” Cryptology ePrint Archive, 2015.
- J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,” Cryptology ePrint Archive, 2012.
- A. Aikata, A. C. Mert, S. Kwon, M. Deryabin, and S. S. Roy, “Reed: Chiplet-based scalable hardware accelerator for fully homomorphic encryption,” arXiv preprint arXiv:2308.02885, 2023.
- S.-X. Zhang, Y. Gong, and D. Yu, “Encrypted speech recognition using deep polynomial networks,” in ICASSP, 2019.
- Q. Lou and L. Jiang, “She: A fast and accurate deep neural network for encrypted data,” NIPS, 2019.
- F. Bourse, M. Minelli, M. Minihold, and P. Paillier, “Fast homomorphic evaluation of deep discretized neural networks,” in Crypto, 2018.
- A. Sanyal, M. Kusner, A. Gascon, and V. Kanade, “TAPAS: Tricks to accelerate (encrypted) prediction as a service,” in ICML, 2018.
- E. Chou, J. Beal, D. Levy, S. Yeung, A. Haque, and L. Fei-Fei, “Faster cryptonets: Leveraging sparsity for real-world encrypted inference,” arXiv preprint arXiv:1811.09953, 2018.
- Y. Lu, J. Lin, C. Jin, Z. Wang, M. Wu, K. M. M. Aung, and X. Li, “Ffconv: Fast factorized convolutional neural network inference on encrypted data,” arXiv preprint arXiv:2102.03494, 2021.
- M. Khalil-Hani, V. P. Nambiar, and M. Marsono, “Hardware acceleration of openssl cryptographic functions for high-performance internet security,” in International Conference on Intelligent Systems, Modelling and Simulation, 2010.
- D. B. Cousins, K. Rohloff, and D. Sumorok, “Designing an fpga-accelerated homomorphic encryption co-processor,” IEEE Transactions on Emerging Topics in Computing, 2016.
- D. Lee, W. Lee, H. Oh, and K. Yi, “Optimizing homomorphic evaluation circuits by program synthesis and term rewriting,” in PLDI, 2020.
- J. Kim, S. Kim, J. Choi, J. Park, D. Kim, and J. H. Ahn, “Sharp: A short-word hierarchical accelerator for robust and practical fully homomorphic encryption,” in ISCA, 2023.
- S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “Bts: An accelerator for bootstrappable fully homomorphic encryption,” in ISCA, 2022.
- Y. Doröz, E. Öztürk, and B. Sunar, “Accelerating fully homomorphic encryption in hardware,” IEEE Transactions on Computers, 2014.
- B. Reagen, W.-S. Choi, Y. Ko, V. T. Lee, H.-H. S. Lee, G.-Y. Wei, and D. Brooks, “Cheetah: Optimizing and accelerating homomorphic encryption for private inference,” in HPCA, 2021.
- I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène, “Tfhe: fast fully homomorphic encryption over the torus,” Journal of Cryptology, 2020.
- J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homomorphic encryption for arithmetic of approximate numbers,” in ASIACRYPT, 2017.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” NIPS, 2020.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
- Z. Yao, R. Yazdani Aminabadi, M. Zhang, X. Wu, C. Li, and Y. He, “Zeroquant: Efficient and affordable post-training quantization for large-scale transformers,” NIPS, 2022.
- K. Xu, Y. Li, H. Zhang, R. Lai, and L. Gu, “Etinynet: Extremely tiny network for tinyml,” in AAAI, 2022.
- J. Yang, H. Zou, S. Cao, Z. Chen, and L. Xie, “Mobileda: Toward edge-domain adaptation,” IEEE Internet of Things Journal, 2020.
- M. Ryu, G. Lee, and K. Lee, “Knowledge distillation for bert unsupervised domain adaptation,” Knowledge and Information Systems, 2022.
- Xue Geng (8 papers)
- Zhe Wang (574 papers)
- Chunyun Chen (3 papers)
- Qing Xu (71 papers)
- Kaixin Xu (15 papers)
- Chao Jin (30 papers)
- Manas Gupta (8 papers)
- Xulei Yang (40 papers)
- Zhenghua Chen (51 papers)
- Mohamed M. Sabry Aly (5 papers)
- Jie Lin (142 papers)
- Min Wu (201 papers)
- Xiaoli Li (120 papers)