Papers
Topics
Authors
Recent
2000 character limit reached

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

Published 3 Apr 2024 in cs.HC, cs.AI, and cs.LG | (2404.03085v1)

Abstract: On-device ML moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (102)
  1. Yongsu Ahn and Yu-Ru Lin. 2019. Fairsight: Visual analytics for fairness in decision making. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 1086–1095.
  2. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice. IEEE, 291–300. https://doi.org/10.1109/icse-seip.2019.00042
  3. Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 337–346.
  4. Apple. 2021. On-device panoptic segmentation for camera using transformers. Machine Learning Research (2021). https://machinelearning.apple.com/research/panoptic-segmentation
  5. Apple. 2022a. Deploying transformers on the Apple Neural Engine. Machine Learning Research (2022). https://machinelearning.apple.com/research/neural-engine-transformers
  6. Apple. 2022b. A multi-task neural architecture for on-device scene analysis. Machine Learning Research (2022). https://machinelearning.apple.com/research/on-device-scene-analysis
  7. Apple. 2023. Optimizing models - Core ML Tools overview. https://coremltools.readme.io/docs
  8. Benchmarking tinyml systems: Challenges and direction. arXiv preprint arXiv:2003.04821 (2020).
  9. Symphony: Composing interactive interfaces for machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3491102.3502102
  10. DendroMap: Visual exploration of large-scale image datasets for machine learning with treemaps. IEEE Transactions on Visualization and Computer Graphics (2022).
  11. Ekaba Bisong and Ekaba Bisong. 2019. Google colaboratory. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners (2019), 59–64.
  12. Carolyn Boyce and Palena Neale. 2006. Conducting in-depth interviews: A guide for designing and conducting in-depth interviews for evaluation input. Vol. 2. Pathfinder International Watertown, MA.
  13. The role of interactive visualization in explaining (large) NLP models: From data to inference. arXiv preprint arXiv:2301.04528 (2023).
  14. Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2376–2385.
  15. FairVis: Visual analytics for discovering intersectional bias in machine learning. In IEEE Conference on Visual Analytics Science and Technology. IEEE, 46–56.
  16. Zeno: An interactive framework for behavioral evaluation of machine learning. In CHI Conference on Human Factors in Computing Systems (Hamburg, Germany). Association for Computing Machinery, New York, NY, USA, 22 pages. https://doi.org/10.1145/3544548.3581268
  17. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126–136. https://doi.org/10.1109/msp.2017.2765695
  18. Differentiable k-means clustering layer for neural network compression. In International Conference on Learning Representations. https://arxiv.org/abs/2108.12659
  19. iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction. In 2010 IEEE Symposium on Visual Analytics Science and Technology. IEEE, 27–34.
  20. A comprehensive survey on model compression and acceleration. Artificial Intelligence Review 53, 7 (2020), 5113–5155. https://doi.org/10.1007/s10462-020-09816-7
  21. A review of overview+detail, zooming, and focus+context interfaces. ACM Computing Surveys (CSUR) 41, 1 (2009), 1–31.
  22. Subhajit Das and Alex Endert. 2020. LEGION: visually compare modeling techniques for regression. In 2020 Visualization in Data Science. IEEE, 12–21.
  23. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485–532. https://doi.org/10.1109/jproc.2020.2976475
  24. A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Transactions on Internet of Things 2, 3 (2021), 1–49. https://doi.org/10.1145/3450494
  25. Marissa Dotter and Chris M Ward. 2018. Visualizing compression of deep learning models for classification. In 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). IEEE, 1–8.
  26. hls4ml: An open-source codesign workflow to empower scientific low-power machine learning devices. (2021). arXiv:2103.05579
  27. A survey of quantization methods for efficient neural network inference. arXiv (2021). arXiv:2103.13630
  28. Artificial intelligence. Our World in Data (2022). https://ourworldindata.org/artificial-intelligence.
  29. Graham R Gibbs. 2007. Thematic coding and categorizing. Analyzing Qualitative Data 703 (2007), 38–56.
  30. Github. 2021. Copilot. https://github.com/features/copilot
  31. Google. 2019. QKeras. https://github.com/google/qkeras
  32. Google. Accessed 2022. Why on-device machine learning? Google Developers (Accessed 2022). https://developers.google.com/learn/topics/on-device-ml/learn-more
  33. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819. https://doi.org/10.1007/s11263-021-01453-z
  34. VATLD: A visual analytics system to assess, understand and improve traffic light detection. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 261–271.
  35. From server-based to client-based machine learning: A comprehensive survey. Comput. Surveys 54, 1 (2021), 1–36. https://doi.org/10.1145/3424660
  36. Neo: Generalizing confusion matrix visualization to hierarchical and multi-output labels. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3491102.3501823
  37. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. (2016).
  38. MLX: Efficient and flexible machine learning on Apple silicon. https://github.com/ml-explore
  39. Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
  40. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research 22, 241 (2021), 1–124.
  41. Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE Transactions on Visualization and Computer Graphics (2018). https://doi.org/10.1109/TVCG.2018.2843369
  42. Model compression in practice: Lessons learned from practitioners creating on-device machine learning experiences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3613904.3642109
  43. exbert: A visual analysis tool to explore learned representations in transformers models. arXiv preprint arXiv:1910.05276 (2019).
  44. Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166.
  45. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv abs/1704.04861 (2017). arXiv:1704.04861
  46. Google Inc. 2021. Know Your Data. https://knowyourdata.withgoogle.com/
  47. Intel. 2020. Neural Compressor. https://github.com/intel/neural-compressor
  48. Visual exploration of machine learning results using data cube analysis. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1–6.
  49. Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.
  50. mage: Fluid moves between code and graphical work in computational notebooks. In Proceedings of the ACM Symposium on User Interface Software and Technology. ACM. https://doi.org/10.1145/3379337.3415842
  51. Jupyter Notebooks-a publishing format for reproducible computational workflows. Elpub 2016 (2016), 87–90.
  52. Interviews in the social sciences. Nature Reviews Methods Primers 2, 1 (2022), 1–15.
  53. Cnnpruner: Pruning convolutional neural networks with visual analytics. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1364–1373.
  54. Learning IoT in edge: Deep learning for the nternet of Things with edge computing. IEEE network 32, 1 (2018), 96–101.
  55. Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials 22, 3 (2020), 2031–2063. https://doi.org/10.1109/comst.2020.2986024
  56. A visual analytics framework for explaining and diagnosing transfer learning processes. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1385–1395.
  57. Gaurav Menghani. 2023. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. Comput. Surveys 55, 12 (2023), 1–37.
  58. Microsoft. 2021. Neural network intelligence. https://github.com/microsoft/nni
  59. Microsoft. 2023. Visual studio code. https://code.visualstudio.com/
  60. Machine learning at the network edge: A survey. Comput. Surveys 54, 8 (2021), 1–37. https://doi.org/10.1145/3469029
  61. NVIDIA. 2023. NVIDIA deep learning TensorRT documentation. https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#optimize-performance
  62. OpenAI. 2021. OpenAI Codex. https://openai.com/blog/openai-codex
  63. OpenAI. 2023. GPT-4 technical report. arXiv (2023). arXiv:2303.08774
  64. Investigating statistical machine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 667–676. https://doi.org/10.1145/1357054.1357160
  65. Model compression via distillation and quantization. arXiv (2018). arXiv:1802.05668
  66. PyTorch. 2018. Quantization. https://pytorch.org/docs/stable/quantization.html
  67. PyTorch. 2019. Sparisty. https://pytorch.org/docs/stable/sparse.html
  68. PyTorch. 2023. PyTorch Examples. https://pytorch.org/tutorials/
  69. Using the Jupyter notebook as a tool for open science: An empirical study. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 1–2.
  70. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 61–70.
  71. Lutz Roeder. 2017. Netron, visualizer for neural network, deep learning, and machine learning models. https://doi.org/10.5281/zenodo.5854962
  72. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention. Springer, 234–241.
  73. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and pattern Recognition. 4510–4520. https://doi.org/10.1109/cvpr.2018.00474
  74. Edgar H Schein. 1990. Organizational culture. Vol. 45. American Psychological Association.
  75. Machine learning: The high interest credit card of technical debt. Google (2014).
  76. Abhishek Sehgal and Nasser Kehtarnavaz. 2019. Guidelines and benchmarks for deployment of deep learning models on smartphones as real-time apps. Machine Learning and Knowledge Extraction 1, 1 (2019), 450–465.
  77. Ben Shneiderman. 1996. The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings 1996 IEEE Symposium on Visual Languages. IEEE, 336–343.
  78. Stanford. 2023. The AI index report: Measuring trends in artificial intelligence. https://aiindex.stanford.edu/report/
  79. LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 667–676.
  80. Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2022), 1146–1156.
  81. Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114. arXiv:1905.11946
  82. TensorFlow. 2018. Introducing the Model Optimization Toolkit for TensorFlow. https://blog.tensorflow.org/2018/09/introducing-model-optimization-toolkit.html
  83. TensorFlow. 2020. Quantization aware training with TensorFlow Model Optimization Toolkit - performance with accuracy. https://blog.tensorflow.org/2020/04/quantization-aware-training-with-tensorflow-model-optimization-toolkit.html
  84. David R Thomas. 2003. A general inductive approach for qualitative data analysis. American Journal of Evaluation 27, 2 (2003), 237–246.
  85. Edward R Tufte. 1986. The visual display of quantitative information. (1986).
  86. An improved one millisecond mobile backbone. arXiv preprint arXiv:2206.04040 (2022).
  87. FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization. arXiv preprint arXiv:2303.14189 (2023).
  88. Machine learning model sizes and the parameter gap. arXiv:2207.02852 [cs.LG]
  89. Pete Warden and Daniel Situnayake. 2019. Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers. O’Reilly Media.
  90. Data and Network Introspection Kit. https://github.com/apple/dnikit
  91. The what-if tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics 26, 1 (2019), 56–65.
  92. Visualizing dataflow graphs of deep learning models in TensorFlow. IEEE Transactions on Visualization and Computer Graphics (2018).
  93. Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. In International Conference on Machine Learning. PMLR, 5363–5372.
  94. Visualization and pruning of SSD with the base network VGG16. In Proceedings of the 2017 International Conference on Deep Learning Technologies. 90–94.
  95. VAC-CNN: A visual analytics system for comparative studies of deep convolutional neural networks. IEEE Transactions on Visualization and Computer Graphics 28, 6 (2022), 2326–2337.
  96. Resource management using machine learning in mobile edge computing: A survey. In 2019 Ninth International Conference on Intelligent Computing and Information Systems. IEEE, 112–117. https://doi.org/10.1109/icicis46948.2019.9014733
  97. How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–23.
  98. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856. https://doi.org/10.1109/cvpr.2018.00716
  99. A survey of deep learning on mobile devices: Applications, optimizations, challenges, and research opportunities. Proc. IEEE 110, 3 (2022), 334–354. https://doi.org/10.1109/jproc.2022.3153408
  100. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
  101. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE 107, 8 (2019), 1738–1762. https://doi.org/10.1109/jproc.2019.2918951
  102. Dynamic resolution network. Advances in Neural Information Processing Systems 34 (2021), 27319–27330. arXiv:2106.02898
Citations (3)

Summary

  • The paper introduces Talaria, an interactive visualization and optimization system designed to help practitioners efficiently optimize machine learning models for inference on resource-constrained devices, focusing on hardware metrics like size, latency, and power.
  • Talaria provides complementary Table and Graph views for exploring low-level statistics and visualizing computational graphs, enabling real-time simulation of optimization strategies like quantization and pruning to identify and address performance bottlenecks.
  • Empirical evaluation showed significant adoption (over 800 users, 3600 models) and strong usability survey results, with users highlighting its utility in unveiling unexpected bottlenecks and improving collaborative optimization efficiency.

Essay on "Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference"

Talaria is an advanced interactive visualization and optimization system developed to support practitioners in creating efficient on-device ML models. As ML models are increasingly deployed on personal devices for improved privacy and user experience, the need to optimize these models for resource-constrained environments has become paramount. This paper discusses the design and development of Talaria, which is specifically geared towards optimizing these models by focusing on critical hardware metrics such as model size, latency, and power consumption.

Key Features and Methodologies

Talaria offers two principal views that are instrumental in model optimization: the Table View and the Graph View. The Table View facilitates rapid analytical examination of low-level model statistics, empowering users to explore, sort, and filter information on hardware tasks. Conversely, the Graph View provides a visual representation of the hardware operations as a computational graph, enabling users to identify structural bottlenecks that affect performance. These complementary views collectively support the identification and optimization of computationally expensive operations within a model, aligning with tasks (T1) and (T2) outlined in the system's design.

Central to Talaria's innovation is its ability to simulate and visualize model optimizations in real-time. By precomputing various compression strategies, Talaria permits practitioners to evaluate the effects of different optimizations, such as quantization and pruning, on performance metrics without affecting model accuracy significantly. This capacity to engage with targeted and model-wide optimizations directly addresses bottlenecks (C2) and provides practitioners a crucial lever for iterative experimentation (T3).

The paper elucidates on the formulation and implementation of Talaria following formative research that highlighted key pain points experienced by ML experts. This research informed the interactive features that Talaria now offers, such as real-time statistical updates and the unique capability to map low-level operations back to their source code, addressing (T4) and (T5).

Empirical Evaluation

Talaria's design and deployment have resulted in impressive adoption metrics, as evidenced by over 800 users with more than 3,600 models submitted. This substantial engagement has underlined the system's value proposition and practical utility. The results from a usability survey, involving 26 participants, highlight the system's robust interface, wherein the majority of users found the dual-direction linking between visualization modes particularly useful. Furthermore, interviews with the most active users emphasized the system’s role in unveiling unexpected bottlenecks and the efficiency gains in collaborative optimization efforts.

Implications and Future Prospects

The implications of Talaria's development are significant. Practically, it offers a versatile solution to a prevalent problem in ML deployment on constrained devices: how to maintain robust model performance while adhering to tight resource limits. Theoretically, Talaria serves as a blueprint for integrating interactive optimization tools within the ML lifecycle, pointing towards more inclusive practices that accommodate efficient deployment.

As the field progresses, further integration of behavioral metrics alongside the existing hardware focus could provide a more holistic optimization framework. Additionally, enhanced collaborative features are anticipated to enrich team-based model development, fostering a more iterative and seamless development environment.

Conclusion

Talaria is a notable advancement in the toolset available for ML practitioners concerned with on-device model efficiency. By dynamically linking analytical insights with real-time visualization and optimization, it encapsulates both the complexity and necessity of model sophistication in edge computing contexts. Future work is expected to build on this foundation, bridging existing gaps and opening avenues for even more nuanced applications in the broader domain of intelligent ML-powered experiences.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 2 likes about this paper.