Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics (2403.14353v3)

Published 21 Mar 2024 in cs.AR, cs.LG, and cs.RO

Abstract: Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (84)
  1. S. Ahmad, S. Subramanian, V. Boppana, S. Lakka, F.-H. Ho, T. Knopp, J. Noguera, G. Singh, and R. Wittig, “Xilinx first 7nm device: Versal ai core (vc1902).” in HotChips, 2019.
  2. J. Bang, H. Kim, Y. Yoo, J.-W. Ha, and J. Choi, “Rainbow memory: Continual learning with a memory of diverse samples,” in CVPR, 2021.
  3. S. Bateni and C. Liu, “NeuOS: A Latency-Predictable Multi-Dimensional optimization framework for DNN-driven autonomous systems,” in USENIX ATC, 2020.
  4. R. Bhardwaj, Z. Xia, G. Ananthanarayanan, J. Jiang, Y. Shu, N. Karianakis, K. Hsieh, P. Bahl, and I. Stoica, “Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers,” in NSDI, 2022.
  5. Z. Borsos, M. Mutny, and A. Krause, “Coresets via bilevel optimization for continual learning and streaming,” NIPS, 2020.
  6. I. Bozcan and E. Kayacan, “AU-AIR: A Multi-modal Unmanned Aerial Vehicle Dataset for Low Altitude Traffic Surveillance,” in ICRA, 2020.
  7. I. Bozcan and E. Kayacan, “Context-dependent anomaly detection for low altitude traffic surveillance,” in ICRA, 2021.
  8. B. Brown, M. Broth, and E. Vinkhuyzen, “The halting problem: Video analysis of self-driving cars in traffic,” in CHI, 2023.
  9. N. Chen, Y. Chen, Y. You, H. Ling, P. Liang, and R. Zimmermann, “Dynamic urban surveillance video stream processing using fog computing,” in BigMM, 2016.
  10. T. Y.-H. Chen, L. Ravindranath, S. Deng, P. Bahl, and H. Balakrishnan, “Glimpse: Continuous, real-time object recognition on mobile devices,” in SenSys, 2015.
  11. M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” NIPS, 2015.
  12. M. Courbariaux, Y. Bengio, and J. David, “Low precision arithmetic for deep learning,” in ICLR, 2015.
  13. T. Dao, A. Roy-Chowdhury, N. Nasrabadi, S. V. Krishnamurthy, P. Mohapatra, and L. M. Kaplan, “Accurate and timely situation awareness retrieval from a bandwidth constrained camera network,” in MASS, 2017.
  14. B. Darvish Rouhani, D. Lo, R. Zhao, M. Liu, J. Fowers, K. Ovtcharov, A. Vinogradsky, S. Massengill, L. Yang, R. Bittner et al., “Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point,” in NeurIPS, 2020.
  15. B. Darvish Rouhani, R. Zhao, V. Elango, R. Shafipour, M. Hall, M. Mesmakhosroshahi, A. More, L. Melnick, M. Golub, G. Varatkar et al., “With shared microexponents, a little shifting goes a long way,” in ISCA, 2023.
  16. U. Drolia, K. Guo, and P. Narasimhan, “Precog: Prefetching for image recognition applications at the edge,” in SEC, 2017.
  17. U. Drolia, K. Guo, J. Tan, R. Gandhi, and P. Narasimhan, “Cachier: Edge-caching for recognition applications,” in ICDCS, 2017.
  18. M. Drumond, T. LIN, M. Jaggi, and B. Falsafi, “Training DNNs with Hybrid Block Floating Point,” in NeurIPS, 2018.
  19. S. Dutta and C. Ekenna, “Air-to-Ground Surveillance Using Predictive Pursuit,” in ICRA, 2019.
  20. S. Fang, Z. Wang, Y. Zhong, J. Ge, S. Chen, and Y. Wang, “TBP-Former: Learning Temporal Bird’s-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving,” in CVPR, 2023.
  21. S. Ghodrati, B. H. Ahn, J. K. Kim, S. Kinzer, B. R. Yatham, N. Alla, H. Sharma, M. Alian, E. Ebrahimi, N. S. Kim et al., “Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks,” in MICRO, 2020.
  22. Y. Ghunaim, A. Bibi, K. Alhamoud, M. Alfarra, H. A. A. K. Hammoud, A. Prabhu, P. H. Torr, and B. Ghanem, “Real-time evaluation in online continual learning: A new hope,” in CVPR, 2023.
  23. Google, “Edge tpu,” https://cloud.google.com/edge-tpu/, 2018.
  24. Google, “Bfloat16: The secret to high performance on cloud tpus,” https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus?hl=en, 2019.
  25. P. Guo, B. Hu, R. Li, and W. Hu, “Foggycache: Cross-device approximate computation reuse,” in MobiCom, 2018.
  26. Y. Guo, B. Zou, J. Ren, Q. Liu, D. Zhang, and Y. Zhang, “Distributed and efficient object detection via interactions among devices, edge, and cloud,” ToMM, 2019.
  27. Hailo, “Hailo expands hail-8 ai accelerator portfolio,” https://hailo.ai/hailo-8l-entry-level-ai-accelerator-announcement/, 2023.
  28. N.-M. Ho and W.-F. Wong, “Exploiting half precision arithmetic in nvidia gpus,” in HPEC, 2017.
  29. J. Houyon, A. Cioppa, Y. Ghunaim, M. Alfarra, A. Halin, M. Henry, B. Ghanem, and M. V. Droogenbroeck, “Online Distillation With Continual Learning for Cyclic Domain Shifts,” in CVPR, 2023.
  30. S. Hu, L. Chen, P. Wu, H. Li, J. Yan, and D. Tao, “ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning,” in ECCV, 2022.
  31. S. Huai, D. Liu, H. Kong, X. Luo, W. Liu, R. Subramaniam, C. Makaya, and Q. Lin, “Collate: Collaborative Neural Network Learning for Latency-Critical Edge Systems,” in ICCD, 2022.
  32. S. Huai, L. Zhang, D. Liu, W. Liu, and R. Subramaniam, “ZeroBN: Learning Compact Neural Networks For Latency-Critical Edge Systems,” in DAC, 2021.
  33. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks,” NIPS, 2016.
  34. C.-C. Hung, G. Ananthanarayanan, P. Bodik, L. Golubchik, M. Yu, P. Bahl, and M. Philipose, “Videoedge: Processing camera streams using hierarchical clusters,” in SEC, 2018.
  35. S. Jung, H. Ahn, S. Cha, and T. Moon, “Continual learning with node-importance based adaptive group sparse regularization,” NIPS, 2020.
  36. Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” in ASPLOS, 2017.
  37. M. Khani, G. Ananthanarayanan, K. Hsieh, J. Jiang, R. Netravali, Y. Shu, M. Alizadeh, and V. Bahl, “RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics,” in NSDI, 2023.
  38. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” PNAS, 2017.
  39. H. Kong, D. Liu, X. Luo, S. Huai, R. Subramaniam, C. Makaya, Q. Lin, and W. Liu, “Towards Efficient Convolutional Neural Network for Embedded Hardware via Multi-Dimensional Pruning,” in DAC, 2023.
  40. Y. Kong, P. Yang, and Y. Cheng, “Edge-Assisted On-Device Model Update for Video Analytics in Adverse Environments,” in MM, 2023.
  41. U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Gray, S. Hall, L. Hornof et al., “Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks,” in NeurIPS, 2017.
  42. J. Lee, H. G. Hong, D. Joo, and J. Kim, “Continual learning with extended kronecker-factored approximate curvature,” in CVPR, 2020.
  43. J. Lee, J. Choi, J. Kim, J. Lee, and Y. Kim, “Dataflow mirroring: Architectural support for highly efficient fine-grained spatial multitasking on systolic-array npus,” in DAC, 2021.
  44. S. Lee, M. Weerakoon, J. Choi, M. Zhang, D. Wang, and M. Jeon, “CarM: Hierarchical episodic memory for continual learning,” in DAC, 2022.
  45. D. D. Lewis, “A Sequential Algorithm for Training Text Classifiers: Corrigendum and Additional Data,” in ACM Sigir Forum, 1995.
  46. D. Li, T. Salonidis, N. V. Desai, and M. C. Chuah, “Deepcham: Collaborative edge-mediated adaptive deep learning for mobile object recognition,” in SEC, 2016.
  47. Y. Li, D. Yuan, M. Sun, H. Wang, X. Liu, and J. Liu, “Generalized UAV Object Detection via Frequency Domain Disentanglement,” in CVPR, 2023.
  48. D. Lin, S. Talathi, and S. Annapureddy, “Fixed point quantization of deep convolutional networks,” in ICML.   PMLR, 2016, pp. 2849–2858.
  49. P. Liu, B. Qi, and S. Banerjee, “Edgeeye: An edge service framework for real-time intelligent video analytics,” in EdgeSys, 2018.
  50. S. Liu, X. Li, H. Lu, and Y. He, “Multi-Object Tracking Meets Moving UAV,” in CVPR, 2022.
  51. Z. Liu, Y. Wang, K. Han, W. Zhang, S. Ma, and W. Gao, “Post-Training Quantization for Vision Transformer,” in NeurIPS, 2021.
  52. P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh et al., “Mixed Precision Training,” in ICLR, 2018.
  53. R. T. Mullapudi, S. Chen, K. Zhang, D. Ramanan, and K. Fatahalian, “Online Model Distillation for Efficient Video Inference,” in ICCV, 2019.
  54. S.-H. Noh, J. Koo, S. Lee, J. Park, and J. Kung, “FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support,” in arXiv, 2022.
  55. NVIDIA, “Getting immediate speedups with nvidia a100 tf32,” https://developer.nvidia.com/blog/getting-immediate-speedups-with-a100-tf32/, 2020.
  56. NVIDIA, “Nvidia jetson,” https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/, 2023.
  57. I. Paik, S. Oh, T. Kwak, and I. Kim, “Overcoming catastrophic forgetting by neuron-level plasticity control,” in AAAI, 2020.
  58. A. Prabhu, H. A. A. K. Hammoud, P. Dokania, P. H. Torr, S.-N. Lim, B. Ghanem, and A. Bibi, “Computationally budgeted continual learning: What does matter?” in CVPR, 2023.
  59. S. Qian Zhang, B. McDanel, and H. T. Kung, “FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding,” in HPCA, 2022.
  60. E. Qin, A. Samajdar, H. Kwon, V. Nadella, S. Srinivasan, D. Das, B. Kaul, and T. Krishna, “SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training,” in HPCA, 2020.
  61. R. Razani, G. Morin, E. Sari, and V. P. Nia, “Adaptive binary-ternary quantization,” in CVPR, 2021.
  62. S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in CVPR, 2017.
  63. A. Samajdar, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna, “Scale-sim: Systolic cnn accelerator simulator,” arXiv preprint arXiv:1811.02883, 2018.
  64. S. S. Sarwar, A. Ankit, and K. Roy, “Incremental learning in deep convolutional neural networks using partial network sharing,” PAML, 2019.
  65. H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks,” in ISCA, 2018.
  66. D. Shim, Z. Mai, J. Jeong, S. Sanner, H. Kim, and J. Jang, “Online class-incremental continual learning with adversarial shapley value,” in AAAI, 2021.
  67. M. K. Shirkoohi, P. Hamadanian, A. Nasr-Esfahany, and M. Alizadeh, “Real-Time Video Inference on Edge Devices via Adaptive Model Streaming,” in CVPR, 2021.
  68. A. Skillman and T. Edso, “A technical overview of cortex-m55 and ethos-u55: Arm’s most capable processors for endpoint ai,” in HotChips, 2020.
  69. X. Sun, J. Choi, C.-Y. Chen, N. Wang, S. Venkataramani, V. V. Srinivasan, X. Cui, W. Zhang, and K. Gopalakrishnan, “Hybrid 8-bit floating point (hfp8) training and inference for deep neural networks,” NIPS, 2019.
  70. A. Suprem, J. Arulraj, C. Pu, and J. E. Ferreira, “ODIN: Automated Drift Detection and Recovery in Video Analytics,” in VLDB, 2020.
  71. E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee, B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti et al., “Compute solution for tesla’s full self-driving computer,” MICRO, 2020.
  72. N. Tijtgat, W. Van Ranst, T. Goedeme, B. Volckaert, and F. De Turck, “Embedded Real-Time Object Detection for a UAV Warning System,” in ICCV, 2017.
  73. J. Wang, Z. Feng, Z. Chen, S. George, M. Bala, P. Pillai, S.-W. Yang, and M. Satyanarayanan, “Bandwidth-efficient live video analytics for drones via edge computing,” in SEC, 2018.
  74. L. Wang, M. Zhang, Z. Jia, Q. Li, C. Bao, K. Ma, J. Zhu, and Y. Zhong, “Afec: Active forgetting of negative transfer in continual learning,” NIPS, 2021.
  75. N. Wang, J. Choi, D. Brand, C.-Y. Chen, and K. Gopalakrishnan, “Training deep neural networks with 8-bit floating point numbers,” NIPS, 2018.
  76. S. Yi, Z. Hao, Q. Zhang, Q. Zhang, W. Shi, and Q. Li, “Lavea: Latency-aware video analytics on edge computing platform,” in SEC, 2017.
  77. J. Yoon, D. Madaan, E. Yang, and S. J. Hwang, “Online coreset selection for rehearsal-based continual learning,” in ICLR, 2022.
  78. F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645.
  79. Z. Yuan, C. Xue, Y. Chen, Q. Wu, and G. Sun, “PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization,” in ECCV, 2022.
  80. C. Zhang, Q. Cao, H. Jiang, W. Zhang, J. Li, and J. Yao, “Ffs-va: A fast filtering system for large-scale video analytics,” in ICPP, 2018.
  81. L. Zhang, G. Gao, and H. Zhang, “Towards Data-Efficient Continuous Learning for Edge Video Analytics via Smart Caching,” in SenSys, 2022.
  82. P. Zhang, J. Zhao, D. Wang, H. Lu, and X. Ruan, “Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline,” in CVPR, 2022.
  83. S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” CoRR, 2016.
  84. C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,” in ICLR, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yoonsung Kim (7 papers)
  2. Changhun Oh (45 papers)
  3. Jinwoo Hwang (47 papers)
  4. Wonung Kim (1 paper)
  5. Seongryong Oh (1 paper)
  6. Yubin Lee (3 papers)
  7. Hardik Sharma (7 papers)
  8. Amir Yazdanbakhsh (38 papers)
  9. Jongse Park (14 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com