Lightweight Deep Learning for Resource-Constrained Environments: A Survey (2404.07236v2)
Abstract: Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model's accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and LLMs. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.
- TensorFlow: A system for large-scale machine learning. In OSDI. 265–283.
- Zero-Cost Proxies for Lightweight NAS. (2021).
- AIM. 2022. Advances in Image Manipulation workshop in conjunction with ECCV 2022. Retrieved November 2, 2023 from https://data.vision.ee.ethz.ch/cvl/aim22/
- D. Amodei and D. Hernandez. 2018. AI and Compute. Retrieved November 2, 2023 from https://openai.com/blog/ai-and-compute
- Efficient semantic segmentation via self-attention and self-distillation. T-ITS 23, 9 (2022), 15256–15266.
- PaLM 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
- Dissecting FLOPs along input dimensions for GreenAI cost estimations. In LOD. 86–100.
- MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. MLSys 3 (2021).
- Scalable methods for 8-bit training of neural networks. NIPS 31 (2018).
- M. Bastian. 2023. GPT-4 has more than a trillion parameters - Report. Retrieved March 1, 2024 from https://the-decoder.com/gpt-4-has-a-trillion-parameters/
- Deep model compression and architecture optimization for embedded systems: A survey. JSPS 93, 8 (2021), 863–878.
- An improving method for loop unrolling. IJCSIS 11, 5 (2013), 73–76.
- Once-for-All: Train One Network and Specialize it for Efficient Deployment. In ICLR.
- ZeroQ: A novel zero shot quantization framework. In CVPR. 13169–13178.
- CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices. TCAS-II 67, 5 (2020), 871–875.
- Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead. IEEE Access 8 (2020), 225134–225180.
- SLIDE: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems. MLSys 2 (2020), 291–306.
- Fashionmirror: Co-attention feature-remapping virtual try-on with sequential template poses. In ICCV. 13809–13818.
- Knowledge distillation with the reused teacher classifier. In CVPR. 11933–11942.
- Cross-layer distillation with semantic calibration. In AAAI, Vol. 35. 7028–7036.
- AdderNet: Do We Really Need Multiplications in Deep Learning?. In CVPR.
- Distilling knowledge via knowledge review. In CVPR. 5008–5017.
- DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News 42, 1 (2014), 269–284.
- MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. NIPSW.
- You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design. In DATE. 1283–1286.
- All you need is a few shifts: Designing efficient convolutional neural networks for image classification. In CVPR. 7241–7250.
- Mobile-Former: Bridging MobileNet and Transformer. In CVPR. 5270–5279.
- DaDianNao: A machine-learning supercomputer. In MICRO. 609–622.
- Detnas: Neural architecture search on object detection. NIPS 1, 2 (2019), 4–1.
- cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).
- Reconfigurable binary neural network accelerator with adaptive parallelism scheme. Electronics 10, 3 (2021), 230.
- Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).
- Dance: Differentiable accelerator/network co-exploration. In DAC.
- F. Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In CVPR. 1251–1258.
- Rethinking attention with performers. In ICLR.
- Fbnetv3: Joint architecture-recipe search using predictor pretraining. In CVPR. 16276–16285.
- CoAtNet: Marrying convolution and attention for all data sizes. NIPS 34 (2021), 3965–3977.
- TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. In MLSys, Vol. 3. 800–811.
- Unbiased mean teacher for cross-domain object detection. In CVPR. 4091–4101.
- Learning to prune deep neural networks via layer-wise optimal brain surgeon. NIPS 30 (2017).
- Hawq-v2: Hessian aware trace-weighted quantization of neural networks. NIPS 33 (2020), 18518–18529.
- Hawq: Hessian aware quantization of neural networks with mixed-precision. In ICCV. 293–302.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.
- A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. TCAS-I 65, 1 (2017), 198–208.
- Application of Microcontroller in Assembly Line for Safety and Controlling. IJRAR 6, 1 (2019), 107–111.
- Convit: Improving vision transformers with soft convolutional inductive biases. In ICML. 2286–2296.
- Deepshift: Towards multiplication-less neural networks. In CVPR. 2359–2368.
- Adaptive Gradient Quantization for Data-Parallel SGD. NIPS 33 (2020), 3174–3185.
- Hardware and Algorithm Co-Optimization for pointwise convolution and channel shuffle in ShuffleNet V2. In SMC. 3212–3217.
- Auto-sklearn: efficient and robust automated machine learning. In Automated Machine Learning. 113–134.
- L. Foundation. 2017. ONNX. Retrieved November 2, 2023 from https://onnx.ai/
- Symbolic DNN-tuner. Machine Learning (2022), 1–26.
- J. Frankle and M. Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. ICLR.
- E. Frantar and D. Alistarh. 2023. SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot. arXiv preprint arXiv:2301.00774 (2023).
- Optimizing data locality by executor allocation in spark computing environment. ComSIS 20, 1 (2023), 491–512.
- Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models. In ICLR.
- A survey of quantization methods for efficient neural network inference. (2022), 291–326.
- SqueezeNext: Hardware-aware neural network design. In CVPRW. 1638–1647.
- The reversible residual network: Backpropagation without storing activations. NIPS 30 (2017).
- Google. 2023. Post-training quantization — TensorFlow Lite. Retrieved November 2, 2023 from https://www.tensorflow.org/lite/performance/post_training_quantization
- Knowledge distillation: A survey. IJCV 129, 6 (2021), 1789–1819.
- LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference. In ICCV. 12259–12269.
- R. M. Gray and D. L. Neuhoff. 1998. Quantization. TIT 44, 6 (1998), 2325–2383.
- Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. TCAD 37, 1 (2017), 35–47.
- Online knowledge distillation via collaborative learning. In CVPR. 11020–11029.
- Dynamic network surgery for efficient DNNs. NIPS 29 (2016).
- CALIP: Zero-shot enhancement of clip with parameter-free attention. In AAAI, Vol. 37. 746–754.
- M. Gupta and P. Agrawal. 2022. Compression of deep learning models for text: A survey. TKDD 16, 4 (2022), 1–55.
- Deep learning with limited numerical precision. (2015), 1737–1746.
- S. Gupta and B. Akin. 2020. Accelerator-aware Neural Network Design using AutoML. MLSysW (2020).
- A^ 3: Accelerating attention mechanisms in neural networks with approximation. In HPCA. 328–341.
- ELSA: Hardware-Software co-design for efficient, lightweight self-attention mechanism in neural networks. In ISCA. 692–705.
- A Survey on Vision Transformer. TPAMI 45, 1 (2023), 87–110.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR.
- Optimal brain surgeon and general network pruning. In ICNN. 293–299.
- Deep residual learning for image recognition. In CVPR. 770–778.
- AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems 212 (2021), 106622.
- Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration. In CVPR. 2006–2015.
- Soft filter pruning for accelerating deep convolutional neural networks. IJCAI (2018), 2234–2240.
- Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR. 4340–4349.
- AddressNet: Shift-based primitives for efficient convolutional neural networks. In WACV. 1213–1222.
- Channel pruning for accelerating very deep neural networks. In CVPR. 1389–1397.
- Dress with style: Learning style from joint deep embedding of clothing styles and body shapes. TMM 23 (2020), 365–377.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
- Denoising diffusion probabilistic models. NIPS 33 (2020), 6840–6851.
- Learning lightweight lane detection CNNs by self attention distillation. In ICCV. 1013–1021.
- Searching for mobilenetv3. In ICCV. 1314–1324.
- MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
- ESSA: An energy-aware bit-serial streaming deep convolutional neural network accelerator. JSA 111 (2020), 101831.
- Squeeze-and-excitation networks. In CVPR. 7132–7141.
- CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization. TNNLS (2023), 1–13.
- CondenseNet: An efficient DenseNet using learned group convolutions. In CVPR. 2752–2761.
- Densely connected convolutional networks. In CVPR. 4700–4708.
- J.-C. Huang and T. Leng. 1999. Generalized loop-unrolling: a method for program speedup. In ASSET. 244–248.
- Z. Huang and N. Wang. 2019. Like what you like: Knowledge distill via neuron selectivity transfer. In ICLR.
- Binarized neural networks. In NIPS. 4114–4122.
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and¡ 0.5 MB model size. In ICLR.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. (2018), 2704–2713.
- Y. Jeon and J. Kim. 2018. Constructing fast network through deconstruction of convolution. NIPS 31 (2018).
- Visual prompt tuning. In ECCV. 709–727.
- TPU v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In ISCA. 1–14.
- In-datacenter performance analysis of a tensor processing unit. In ISCA. 1–12.
- Learning to quantize deep networks by optimizing quantization intervals with task loss. In CVPR. 4350–4359.
- Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking. In ICCV. 9612–9621.
- M. Kang and B. Han. 2020. Operation-aware soft channel pruning using differentiable masks. In ICML. 7021–7032.
- Self-knowledge distillation with progressive refinement of targets. In ICCV. 6567–6576.
- Reformer: The efficient transformer. In ICLR.
- Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. In Automated Machine Learning. 81–95.
- Imagenet classification with deep convolutional neural networks. NIPS 25 (2012), 1097–1105.
- Scale MLPerf-0.6 models on google TPU-v3 pods. arXiv preprint arXiv:1909.09756 (2019).
- CMSIS-NN: Efficient neural network kernels for arm cortex-m CPUs. arXiv preprint arXiv:1801.06601 (2018).
- Optimal brain damage. NIPS 2 (1989).
- Snip: Single-shot network pruning based on connection sensitivity. ICLR.
- Pruning Filters for Efficient ConvNets. In ICLR.
- A multistage dataflow implementation of a deep convolutional neural network based on FPGA for high-speed object recognition. In SSIAI. 165–168.
- Distilling a Powerful Student Model via Online Knowledge Distillation. TNNLS 34, 11 (2023), 8743–8752.
- Searching for fast model families on datacenter accelerators. In CVPR. 8085–8095.
- EDD: Efficient differentiable DNN architecture and implementation co-search for embedded ai solutions. In DAC. 1–6.
- DiVIT: Algorithm and architecture co-design of differential attention in vision transformer. JSA (2022), 102520.
- Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.
- EViT: Expediting Vision Transformers via Token Reorganizations. In ICLR.
- MCUNetV2: Memory-efficient patch-based inference for tiny deep learning. In NIPS.
- MCUNet: Tiny deep learning on iot devices. NIPS 33 (2020), 11711–11722.
- Knowledge distillation via the target-aware transformer. In CVPR. 10915–10924.
- Knowledge Distillation via the Target-Aware Transformer. In CVPR. 10915–10924.
- Neural-hardware architecture search. NIPSWS (2020).
- Y.-J. Lin and T. S. Chang. 2017. Data and hardware efficient design for convolutional neural network. TCAS-I 65, 5 (2017), 1642–1651.
- Ternary weight networks. In ICASSP. 1–5.
- DARTS: Differentiable Architecture Search. (2019).
- Group fisher pruning for practical network compression. In ICML.
- Post-training quantization with multiple points: Mixed precision without mixed precision. In AAAI, Vol. 35. 8697–8705.
- Swin Transformer V2: Scaling Up Capacity and Resolution. In CVPR. 12009–12019.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012–10022.
- Metapruning: Meta learning for automatic neural network channel pruning. In ICCV. 3296–3305.
- Towards lightweight transformer via group-wise transformation for vision-and-language tasks. TIP 31 (2022), 3386–3398.
- DaDianNao: A neural network supercomputer. IEEE TC 66, 1 (2016), 73–88.
- ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In ECCV. 116–131.
- MAI. 2021. Mobile AI workshop 2021. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2021/#challenges
- MAI. 2022. Mobile AI workshop 2022. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2022/#challenges
- MAI. 2023. Mobile AI workshop 2023. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2023/#challenges
- Delight: Very deep and light-weight transformer. In ICLR.
- Pyramidal recurrent unit for language modeling. In EMNLP.
- Define: Deep factorized input token embeddings for neural sequence modeling. In ICLR.
- S. Mehta and M. Rastegari. 2022. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. In ICLR.
- A Deep Learning Model for Loop Interchange. In ACM SIGPLAN CC. 50–60.
- Mixed Precision Training. (2018).
- Intriguing properties of vision transformers. NIPS 34 (2021).
- FPGA-based Deep Learning Inference Accelerators: Where Are We Standing? TRETS 16, 4 (2023), 1–32.
- NVIDIA. 2023. NVIDIA CUDA-X: GPU Accelerated Libraries. Retrieved November 2, 2023 from https://developer.nvidia.com/gpu-accelerated-libraries
- OpenAI. 2023. GPT-4 Technical Report. (2023).
- SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45, 2 (2017), 27–40.
- Pytorch: An imperative style, high-performance deep learning library. NIPS 32 (2019).
- Collaborative Channel Pruning for Deep Networks. In ICML. 5113–5122.
- Least squares binary quantization of neural networks. In CVPRW. 698–699.
- Designing Deep Learning Hardware Accelerator and Efficiency Evaluation. Comput. Intell. and Neurosci. 2022 (2022).
- Going deeper with embedded FPGA platform for convolutional neural network. In ACM FPGA. 26–35.
- Designing network design spaces. In CVPR. 10428–10436.
- Dynamicvit: Efficient vision transformers with dynamic token sparsification. NIPS 34 (2021), 13937–13949.
- XNOR-Net: Imagenet classification using binary convolutional neural networks. In ECCV. 525–542.
- P. P. Ray. 2022. A review on TinyML: State-of-the-art and prospects. Journal of King Saud University-Computer and Information Sciences 34, 4 (2022), 1595–1623.
- Large-scale evolution of image classifiers. In ICML. 2902–2911.
- A comprehensive survey of neural architecture search: Challenges and solutions. CSUR 54, 4 (2021), 1–34.
- Wearable FPGA platform for accelerated dsp and ai applications. In PerComW. 66–69.
- A comprehensive survey on model quantization for deep neural networks in image classification. TIST 14, 6 (2023), 1–50.
- High-Resolution Image Synthesis With Latent Diffusion Models. In CVPR. 10684–10695.
- Optimal clipping and magnitude-aware differentiation for improved quantization-aware training. In ICML. 19123–19138.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR. 4510–4520.
- Green ai. CACM 63, 12 (2020), 54–63.
- L. Sekanina. 2021. Neural architecture search and hardware accelerator co-search: A survey. IEEE access 9 (2021), 151337–151362.
- Embedded intelligence on FPGA: Survey, applications and challenges. Electronics 10, 8 (2021), 895.
- Post-training quantization on diffusion models. In CVPR. 1972–1981.
- K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.
- S. Sinha. 2023. State of IoT 2023: Number of connected IoT devices growing 16% to 16.7 billion globally. Retrieved November 2, 2023 from https://iot-analytics.com/number-connected-iot-devices/
- Score-based generative modeling through stochastic differential equations. In ICLR.
- Bottleneck transformers for visual recognition. In CVPR. 16519–16529.
- Optimally scheduling CNN convolutions for efficient memory access. arXiv preprint arXiv:1902.01492 (2019).
- Energy and policy considerations for deep learning in NLP. ACL.
- Dynamic group convolution for accelerating convolutional neural networks. In ECCV. 138–155.
- Self-Distilled Vision Transformer for Domain Generalization. In ACCV. 3068–3085.
- A Simple and Effective Pruning Approach for Large Language Models. arXiv preprint arXiv:2306.11695 (2023).
- VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer. arXiv preprint arXiv:2201.06618 (2022).
- Surrogate-Assisted Evolutionary Deep Learning Using an End-to-End Random Forest-Based Performance Predictor. TEVC 24, 2 (2020), 350–364.
- How to evaluate deep neural network processors: Tops/w (alone) considered harmful. SSC-M 12, 3 (2020), 28–41.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.
- Going deeper with convolutions. In CVPR. 1–9.
- Rethinking the inception architecture for computer vision. In CVPR. 2818–2826.
- A. Talwalkar. 2020. The push for energy efficient ”Green AI”. Retrieved November 2, 2023 from https://spectrum.ieee.org/energy-efficient-green-ai-strategies
- Face Detection and Verification Using Lensless Cameras. TCI 5, 2 (2019), 180–194.
- MnasNet: Platform-aware neural architecture search for mobile. In CVPR. 2820–2828.
- M. Tan and Q. Le. 2019a. EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML. 6105–6114.
- M. Tan and Q. Le. 2021. EfficientNetV2: Smaller models and faster training. In ICML. 10096–10106.
- M. Tan and Q. V. Le. 2019b. MixConv: Mixed depthwise convolutional kernels. (2019).
- Compression of generative pre-trained language models via quantization. In ACL.
- A. Tarvainen and H. Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NIPS, Vol. 30.
- Efficient transformers: A survey. CSUR 54, 4 (2021), 1–41.
- Contrastive Representation Distillation. (2020).
- Training data-efficient image transformers & distillation through attention. In ICML. 10347–10357.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- A 43.1 tops/w energy-efficient absolute-difference-accumulation operation computing-in-memory with computation reuse. TCAS-II 68, 5 (2021), 1605–1609.
- H. Vanholder. 2016. Efficient inference with tensorrt. In GPU Technology Conference, Vol. 1.
- Attention is all you need. NIPS 30 (2017).
- UET-Headpose: A sensor-based top-view head pose dataset. In KSE. 1–7.
- Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In CVPR. 12965–12974.
- Spatten: Efficient sparse attention architecture with cascade token and head pruning. In HPCA. 97–110.
- Learnable Lookup Table for Neural Network Quantization. In CVPR. 12423–12433.
- Training deep neural networks with 8-bit floating point numbers. In NIPS. 7686–7695.
- Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).
- Fully learnable group convolution for acceleration of deep neural networks. In CVPR. 9049–9058.
- Towards efficient vision transformer inference: a first study of transformers on mobile devices. In WMCSA. 1–7.
- Sparse-YOLO: Hardware/software co-design of an FPGA accelerator for YOLOv2. IEEE Access 8 (2020), 116569–116585.
- Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In DAC. 1–6.
- M. E. Wolf and M. S. Lam. 1991. A data locality optimizing algorithm. In PLDI. 30–44.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. (2022), 23965–23998.
- FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In CVPR. 10734–10742.
- Shift: A zero flop, zero parameter alternative to spatial convolutions. In CVPR. 9127–9135.
- Cvt: Introducing convolutions to vision transformers. In ICCV. 22–31.
- Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases. arXiv preprint arXiv:2301.12017 (2023).
- Lite transformer with long-short range attention. ICLR.
- Early convolutions help transformers see better. NIPS 34 (2021).
- Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition. In ICCV. 20598–20608.
- Loop interchange and tiling for multi-dimensional loops to minimize write operations on NVMs. JSA 135 (2023), 102799.
- Neural Architecture Search Based on A Multi-objective Evolutionary Algorithm with Probability Stack. TEVC 27, 4 (2023).
- Snapshot distillation: Teacher-student optimization in one generation. In CVPR. 2859–2868.
- Knowledge distillation via softmax regression representation learning. In ICLR.
- Condensenet v2: Sparse feature reactivation for deep networks. In CVPR. 3569–3578.
- Netadapt: Platform-aware neural network adaptation for mobile applications. In ECCV. 285–300.
- Netadaptv2: Efficient neural architecture search with fast super-network training and architecture optimization. In CVPR. 2402–2411.
- Hawq-v3: Dyadic neural network quantization. In ICML. 11875–11886.
- A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. arXiv preprint arXiv:2303.10420 (2023).
- Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR.
- AdaViT: Adaptive Tokens for Efficient Vision Transformer. (2022), 10809–10818.
- Semi-supervised Domain Adaptation via Sample-to-Sample Self-Distillation. In WACV. 1978–1987.
- ShiftAddNet: A Hardware-Inspired Deep Network. NIPS 33 (2020), 2771–2783.
- Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization. In ACL. 218–235.
- Cross-domain object detection with mean-teacher transformer. In ECCV.
- Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. In ICCV. 558–567.
- Revisiting knowledge distillation via label smoothing regularization. In CVPR. 3903–3911.
- M. Yuan and Y. Lin. 2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 1 (2006), 49–67.
- Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM FPGA. 161–170.
- Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. TCAD 38, 11 (2018), 2072–2085.
- Adversarial co-distillation learning for image recognition. Pattern Recognition 111 (2021), 107659.
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. In ICLR.
- Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV. 3836–3847.
- Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV. 3713–3722.
- Cambricon-X: An accelerator for sparse neural networks. In MICRO. 1–12.
- ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR. 6848–6856.
- Y. Zhang and N. M. Freris. 2023. Adaptive Filter Pruning via Sensitivity Feedback. TNNLS (2023), 1–13.
- Deep mutual learning. In CVPR. 4320–4328.
- Differentiable learning-to-group channels via groupable convolutional neural networks. In ICCV. 3542–3551.
- Decoupled Knowledge Distillation. In CVPR. 11953–11962.
- Rethinking bottleneck structure for efficient mobile network design. In ECCV. 680–697.
- Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In MICRO. 15–28.
- Rethinking co-design of neural architectures and hardware accelerators. arXiv preprint arXiv:2102.08619 (2021).
- Trained Ternary Quantization. In ICLR.
- B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. ICLR (2017).
- Hou-I Liu (7 papers)
- Marco Galindo (1 paper)
- Hongxia Xie (11 papers)
- Lai-Kuan Wong (5 papers)
- Hong-Han Shuai (56 papers)
- Wen-Huang Cheng (40 papers)
- Yung-Hui Li (12 papers)