Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OneAdapt: Fast Configuration Adaptation for Video Analytics Applications via Backpropagation (2310.02422v2)

Published 3 Oct 2023 in cs.LG, cs.AI, cs.DC, cs.MM, and cs.NI

Abstract: Deep learning inference on streaming media data, such as object detection in video or LiDAR feeds and text extraction from audio waves, is now ubiquitous. To achieve high inference accuracy, these applications typically require significant network bandwidth to gather high-fidelity data and extensive GPU resources to run deep neural networks (DNNs). While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs. This paper presents OneAdapt, which meets these requirements by leveraging a gradient-ascent strategy to adapt configuration knobs. The key idea is to embrace DNNs' differentiability to quickly estimate the accuracy's gradient to each configuration knob, called AccGrad. Specifically, OneAdapt estimates AccGrad by multiplying two gradients: InputGrad (i.e. how each configuration knob affects the input to the DNN) and DNNGrad (i.e. how the DNN input affects the DNN inference output). We evaluate OneAdapt across five types of configurations, four analytic tasks, and five types of input data. Compared to state-of-the-art adaptation schemes, OneAdapt cuts bandwidth usage and GPU usage by 15-59% while maintaining comparable accuracy or improves accuracy by 1-5% while using equal or fewer resources.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. 1,000-meter autonomous truck perception system - tusimple. https://www.tusimple.com/blogs/tusimple-1000-meter-perception-system/. (Accessed on 09/14/2022).
  2. Audioset. http://research.google.com/audioset/index.html. (Accessed on 02/09/2023).
  3. The best frame rate for video. https://photographylife.com/best-frame-rate-for-video. (Accessed on 09/22/2023).
  4. End-to-end deep learning for self-driving cars — nvidia technical blog. https://developer.nvidia.com/blog/deep-learning-self-driving-cars/. (Accessed on 06/09/2023).
  5. Enhancing smart-home experiences with ai-based voice control — electronic design. https://www.electronicdesign.com/technologies/embedded/article/21252996/knowles-electronics-enhancing-smarthome-experiences-with-aibased-voice-control. (Accessed on 09/24/2023).
  6. fairseq/readme.md at main · facebookresearch/fairseq · github. https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md. (Accessed on 02/09/2023).
  7. Jpeg - wikipedia. https://en.wikipedia.org/wiki/JPEG. (Accessed on 09/26/2023).
  8. Kuntaidu/oneadapt. https://github.com/KuntaiDu/OneAdapt. (Accessed on 09/29/2023).
  9. lec19.pdf. https://www.cs.princeton.edu/courses/archive/fall13/cos521/lecnotes/lec19.pdf. (Accessed on 06/09/2023).
  10. Microsoft rocket video analytics platform. https://github.com/microsoft/Microsoft-Rocket-Video-Analytics-Platform.
  11. Modified discrete cosine transform - wikipedia. https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform. (Accessed on 09/26/2023).
  12. Mta to install security cameras in nyc subway cars – nbc new york. https://www.nbcnewyork.com/news/local/mta-to-install-security-cameras-in-nyc-subway/-cars-to-deter-crime/3872485/. (Accessed on 09/20/2022).
  13. New york to install surveillance cameras in every subway car. https://www.nbcnews.com/tech/tech-news/new-york-subway-cameras-surveillance-mta-train-/cars-hochul-rcna48582. (Accessed on 09/20/2022).
  14. nsdi17-alipourfard.pdf. https://www.usenix.org/system/files/conference/nsdi17/nsdi17-alipourfard.pdf. (Accessed on 09/24/2023).
  15. N.y.c. subway system to install security cameras in train cars - the new york times. https://www.nytimes.com/2022/09/20/nyregion/nyc-subway-security-cameras.html. (Accessed on 09/20/2022).
  16. OneAdapt: Driving Videos — docs.google.com. https://docs.google.com/spreadsheets/d/1KwRDkt2B7h_WemrK5K86MRz6_Y1czK3bz9u4kBnhvk4. [Accessed 02/15/2023].
  17. Oneadapt proof on numerical gradient case - google docs. https://docs.google.com/document/d/1zSISyfzBirW0kD6hc-lWpj4ykhodobhnPlBQh_T3bBA/edit?usp=sharing. (Accessed on 09/29/2023).
  18. Pytorch_yolov3/yolov3.py at master · dena/pytorch_yolov3 · github. https://github.com/DeNA/PyTorch_YOLOv3/blob/master/models/yolov3.py. (Accessed on 02/09/2023).
  19. Top 10 smart home voice control devices — home matters — ahs. https://www.ahs.com/home-matters/tech/smart-home-voice-control-devices/. (Accessed on 09/24/2023).
  20. Video coding for machines (the moving picture experts group). https://mpeg.chiariglione.org/standards/exploration/video-coding-machines.
  21. Vision navigates obstacles on the road to autonomous vehicles — automate. https://www.automate.org/industry-insights/vision-navigates-obstacles-on-the-road-to-autonomous-vehicles. (Accessed on 06/09/2023).
  22. Voice control in the smart / connected home, what you need to know. https://htacertified.org/app/articles/voice-control-in-the-smart-home/. (Accessed on 09/24/2023).
  23. Waymo datase. https://waymo.com/open/.
  24. x264 ffmpeg options guide - linux encoding. https://sites.google.com/site/linuxencoding/x264-ffmpeg-mapping. (Accessed on 09/10/2022).
  25. Boggart: Towards {{\{{General-Purpose}}\}} acceleration of retrospective video analytics. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 933–951, 2023.
  26. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
  27. Ekya: Continuous learning of video analytics models on edge compute servers. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 119–135, 2022.
  28. Scaling video analytics on constrained edge nodes. arXiv preprint arXiv:1905.13536, 2019.
  29. Backprop with approximate activations for memory-efficient network training. Advances in Neural Information Processing Systems, 32, 2019.
  30. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
  31. Glimpse: Continuous, real-time object recognition on mobile devices. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, pages 155–168. ACM, 2015.
  32. Revbifpn: The fully reversible bidirectional feature pyramid network. Proceedings of Machine Learning and Systems, 5, 2023.
  33. Adascale: Towards real-time video object detection using adaptive scaling. arXiv preprint arXiv:1902.02910, 2019.
  34. Stability analysis in discrete optimization involving generalized addition operations. Journal of Optimization Theory and Applications, 167:585–616, 2015.
  35. Lénaïc Chizat. Convergence rates of gradient methods for convex optimization in the space of measures. arXiv preprint arXiv:2105.08368, 2021.
  36. Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding. arXiv preprint arXiv:1703.07475, 2017.
  37. High Efficiency Video Coding and ITUT Rec. H. 265 and iso, 2013.
  38. Exponential regret bounds for gaussian process bandits with deterministic observations. arXiv preprint arXiv:1206.6457, 2012.
  39. Detectron2. Detectron2 model zoo. https://github.com/facebookresearch/detectron2.
  40. {{\{{PCC}}\}} vivace:{{\{{Online-Learning}}\}} congestion control. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 343–356, 2018.
  41. Server-driven video streaming for deep learning inference. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pages 557–570, 2020.
  42. Accmpeg: Optimizing video encoding for accurate video analytics. Proceedings of Machine Learning and Systems, 4, 2022.
  43. Saliency-driven versatile video coding for neural object detection. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1505–1509. IEEE, 2021.
  44. Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004.
  45. Mixed-bandwidth cross-channel speech recognition via joint optimization of dnn-based bandwidth expansion and acoustic modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(3):559–571, 2019.
  46. Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR), 2013.
  47. 3d object detection for autonomous driving: Methods, models, sensors, data, and challenges. Transportation Engineering, 8:100115, 2022.
  48. The reversible residual network: Backpropagation without storing activations. Advances in neural information processing systems, 30, 2017.
  49. GoodVision. Goodvision: Smart traffic data analytics. https://goodvisionlive.com/, 2021.
  50. Memory-efficient backpropagation through time. Advances in neural information processing systems, 29, 2016.
  51. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pages 123–136. ACM, 2016.
  52. Seq-nms for video object detection. arXiv preprint arXiv:1602.08465, 2016.
  53. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  54. Speech recognition at multiple sampling rates. In Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), pages 1837–1840, 2001.
  55. Speech recognition of different sampling rates using fractal code descriptor. In 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 1–5, 2016.
  56. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  57. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, USA, 1st edition, 2001.
  58. intuVision. intuvision va traffic use case. https://www.intuvisiontech.com/intuvisionVA_solutions/intuvisionVA_traffic, 2021.
  59. Experimental investigation on stft phase representations for deep learning-based dysarthric speech detection, 2021.
  60. Chameleon: scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 253–266, 2018.
  61. Blazeit: optimizing declarative aggregation and limit queries for neural network-based video analytics. arXiv preprint arXiv:1805.01046, 2018.
  62. Noscope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 10(11):1586–1597, 2017.
  63. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. Acm Sigplan Notices, 52(4):615–629, 2017.
  64. Bayesian optimization with exponential convergence. Advances in neural information processing systems, 28, 2015.
  65. Reducto: On-camera filtering for resource-efficient real-time video analytics. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pages 359–376, 2020.
  66. Edge assisted real-time object detection for mobile augmented reality. In The 25th Annual International Conference on Mobile Computing and Networking, pages 1–16, 2019.
  67. Gact: Activation compressed training for generic network architectures. In International Conference on Machine Learning, pages 14139–14152. PMLR, 2022.
  68. Microsoft. Traffic video analytics – case study report. https://www.microsoft.com/en-us/research/publication/traffic-video-analytics-case-study-report/, 2019.
  69. Toward domain-invariant speech recognition via large scale training. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 441–447, 2018.
  70. Yu Nesterov. Gradient methods for minimizing composite functions. Mathematical programming, 140(1):125–161, 2013.
  71. Efficient non-maximum suppression. In 18th international conference on pattern recognition (ICPR’06), volume 3, pages 850–855. IEEE, 2006.
  72. AB Ramazanov. On stability of the gradient algorithm in convex discrete optimisation problems and related questions. 2011.
  73. Deepdecision: A mobile deep learning framework for edge video analytics. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pages 1421–1429. IEEE, 2018.
  74. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
  75. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
  76. Convergence rates of inexact proximal-gradient methods for convex optimization. Advances in neural information processing systems, 24, 2011.
  77. TrafficTechnologyToday. Ai traffic video analytics platform being developed. https://www.traffictechnologytoday.com/news/traffic-management/ai-traffic-video-analytics-platform-being-developed.html, 2019.
  78. TrafficVision. Trafficvision: Traffic intelligence from video. http://www.trafficvision.com/, 2021.
  79. VisionZero. The vision zero initiative. http://www.visionzeroinitiative.com/.
  80. Minimizing packet retransmission for real-time video analytics. In Proceedings of the 13th Symposium on Cloud Computing, pages 340–347, 2022.
  81. Overview of the h.264/avc video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560–576, 2003.
  82. Genet: automatic curriculum generation for learning adaptation in networking. In Proceedings of the ACM SIGCOMM 2022 Conference, pages 397–413, 2022.
  83. Dnn-driven compressive offloading for edge-assisted semantic video segmentation. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pages 1888–1897. IEEE, 2022.
  84. Multimodal end-to-end autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(1):537–547, 2020.
  85. Towards performance clarity of edge video analytics. In 2021 IEEE/ACM Symposium on Edge Computing (SEC), pages 148–164. IEEE, 2021.
  86. Litereconfig: cost and content aware reconfiguration of video object detection systems for mobile gpus. In Proceedings of the Seventeenth European Conference on Computer Systems, pages 334–351, 2022.
  87. Vstore: A data store for analytics on large videos. In Proceedings of the Fourteenth EuroSys Conference 2019, pages 1–17, 2019.
  88. Awstream: Adaptive wide-area streaming analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 236–252. ACM, 2018.
  89. Gradient-based algorithms for convex discrete optimization via simulation. Operations Research, 2022.
  90. Live video analytics at scale with approximation and {{\{{Delay-Tolerance}}\}}. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 377–392, 2017.
  91. Casva: Configuration-adaptive streaming for live video analytics. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pages 2168–2177. IEEE, 2022.
  92. Understanding the potential of server-driven edge video analytics. In Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, page 8–14, 2022.
  93. The design and implementation of a wireless video surveillance system. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, pages 426–438. ACM, 2015.
  94. Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pages 201–214, 2021.

Summary

We haven't generated a summary for this paper yet.