Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference
Abstract: On-device ML moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.
- Yongsu Ahn and Yu-Ru Lin. 2019. Fairsight: Visual analytics for fairness in decision making. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 1086–1095.
- Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice. IEEE, 291–300. https://doi.org/10.1109/icse-seip.2019.00042
- Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 337–346.
- Apple. 2021. On-device panoptic segmentation for camera using transformers. Machine Learning Research (2021). https://machinelearning.apple.com/research/panoptic-segmentation
- Apple. 2022a. Deploying transformers on the Apple Neural Engine. Machine Learning Research (2022). https://machinelearning.apple.com/research/neural-engine-transformers
- Apple. 2022b. A multi-task neural architecture for on-device scene analysis. Machine Learning Research (2022). https://machinelearning.apple.com/research/on-device-scene-analysis
- Apple. 2023. Optimizing models - Core ML Tools overview. https://coremltools.readme.io/docs
- Benchmarking tinyml systems: Challenges and direction. arXiv preprint arXiv:2003.04821 (2020).
- Symphony: Composing interactive interfaces for machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3491102.3502102
- DendroMap: Visual exploration of large-scale image datasets for machine learning with treemaps. IEEE Transactions on Visualization and Computer Graphics (2022).
- Ekaba Bisong and Ekaba Bisong. 2019. Google colaboratory. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners (2019), 59–64.
- Carolyn Boyce and Palena Neale. 2006. Conducting in-depth interviews: A guide for designing and conducting in-depth interviews for evaluation input. Vol. 2. Pathfinder International Watertown, MA.
- The role of interactive visualization in explaining (large) NLP models: From data to inference. arXiv preprint arXiv:2301.04528 (2023).
- Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2376–2385.
- FairVis: Visual analytics for discovering intersectional bias in machine learning. In IEEE Conference on Visual Analytics Science and Technology. IEEE, 46–56.
- Zeno: An interactive framework for behavioral evaluation of machine learning. In CHI Conference on Human Factors in Computing Systems (Hamburg, Germany). Association for Computing Machinery, New York, NY, USA, 22Â pages. https://doi.org/10.1145/3544548.3581268
- Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126–136. https://doi.org/10.1109/msp.2017.2765695
- Differentiable k-means clustering layer for neural network compression. In International Conference on Learning Representations. https://arxiv.org/abs/2108.12659
- iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction. In 2010 IEEE Symposium on Visual Analytics Science and Technology. IEEE, 27–34.
- A comprehensive survey on model compression and acceleration. Artificial Intelligence Review 53, 7 (2020), 5113–5155. https://doi.org/10.1007/s10462-020-09816-7
- A review of overview+detail, zooming, and focus+context interfaces. ACM Computing Surveys (CSUR) 41, 1 (2009), 1–31.
- Subhajit Das and Alex Endert. 2020. LEGION: visually compare modeling techniques for regression. In 2020 Visualization in Data Science. IEEE, 12–21.
- Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485–532. https://doi.org/10.1109/jproc.2020.2976475
- A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Transactions on Internet of Things 2, 3 (2021), 1–49. https://doi.org/10.1145/3450494
- Marissa Dotter and Chris M Ward. 2018. Visualizing compression of deep learning models for classification. In 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). IEEE, 1–8.
- hls4ml: An open-source codesign workflow to empower scientific low-power machine learning devices. (2021). arXiv:2103.05579
- A survey of quantization methods for efficient neural network inference. arXiv (2021). arXiv:2103.13630
- Artificial intelligence. Our World in Data (2022). https://ourworldindata.org/artificial-intelligence.
- Graham R Gibbs. 2007. Thematic coding and categorizing. Analyzing Qualitative Data 703 (2007), 38–56.
- Github. 2021. Copilot. https://github.com/features/copilot
- Google. 2019. QKeras. https://github.com/google/qkeras
- Google. Accessed 2022. Why on-device machine learning? Google Developers (Accessed 2022). https://developers.google.com/learn/topics/on-device-ml/learn-more
- Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819. https://doi.org/10.1007/s11263-021-01453-z
- VATLD: A visual analytics system to assess, understand and improve traffic light detection. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 261–271.
- From server-based to client-based machine learning: A comprehensive survey. Comput. Surveys 54, 1 (2021), 1–36. https://doi.org/10.1145/3424660
- Neo: Generalizing confusion matrix visualization to hierarchical and multi-output labels. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3491102.3501823
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. (2016).
- MLX: Efficient and flexible machine learning on Apple silicon. https://github.com/ml-explore
- Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
- Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research 22, 241 (2021), 1–124.
- Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE Transactions on Visualization and Computer Graphics (2018). https://doi.org/10.1109/TVCG.2018.2843369
- Model compression in practice: Lessons learned from practitioners creating on-device machine learning experiences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3613904.3642109
- exbert: A visual analysis tool to explore learned representations in transformers models. arXiv preprint arXiv:1910.05276 (2019).
- Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv abs/1704.04861 (2017). arXiv:1704.04861
- Google Inc. 2021. Know Your Data. https://knowyourdata.withgoogle.com/
- Intel. 2020. Neural Compressor. https://github.com/intel/neural-compressor
- Visual exploration of machine learning results using data cube analysis. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1–6.
- Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.
- mage: Fluid moves between code and graphical work in computational notebooks. In Proceedings of the ACM Symposium on User Interface Software and Technology. ACM. https://doi.org/10.1145/3379337.3415842
- Jupyter Notebooks-a publishing format for reproducible computational workflows. Elpub 2016 (2016), 87–90.
- Interviews in the social sciences. Nature Reviews Methods Primers 2, 1 (2022), 1–15.
- Cnnpruner: Pruning convolutional neural networks with visual analytics. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1364–1373.
- Learning IoT in edge: Deep learning for the nternet of Things with edge computing. IEEE network 32, 1 (2018), 96–101.
- Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials 22, 3 (2020), 2031–2063. https://doi.org/10.1109/comst.2020.2986024
- A visual analytics framework for explaining and diagnosing transfer learning processes. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1385–1395.
- Gaurav Menghani. 2023. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. Comput. Surveys 55, 12 (2023), 1–37.
- Microsoft. 2021. Neural network intelligence. https://github.com/microsoft/nni
- Microsoft. 2023. Visual studio code. https://code.visualstudio.com/
- Machine learning at the network edge: A survey. Comput. Surveys 54, 8 (2021), 1–37. https://doi.org/10.1145/3469029
- NVIDIA. 2023. NVIDIA deep learning TensorRT documentation. https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#optimize-performance
- OpenAI. 2021. OpenAI Codex. https://openai.com/blog/openai-codex
- OpenAI. 2023. GPT-4 technical report. arXiv (2023). arXiv:2303.08774
- Investigating statistical machine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 667–676. https://doi.org/10.1145/1357054.1357160
- Model compression via distillation and quantization. arXiv (2018). arXiv:1802.05668
- PyTorch. 2018. Quantization. https://pytorch.org/docs/stable/quantization.html
- PyTorch. 2019. Sparisty. https://pytorch.org/docs/stable/sparse.html
- PyTorch. 2023. PyTorch Examples. https://pytorch.org/tutorials/
- Using the Jupyter notebook as a tool for open science: An empirical study. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 1–2.
- Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 61–70.
- Lutz Roeder. 2017. Netron, visualizer for neural network, deep learning, and machine learning models. https://doi.org/10.5281/zenodo.5854962
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention. Springer, 234–241.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and pattern Recognition. 4510–4520. https://doi.org/10.1109/cvpr.2018.00474
- Edgar H Schein. 1990. Organizational culture. Vol. 45. American Psychological Association.
- Machine learning: The high interest credit card of technical debt. Google (2014).
- Abhishek Sehgal and Nasser Kehtarnavaz. 2019. Guidelines and benchmarks for deployment of deep learning models on smartphones as real-time apps. Machine Learning and Knowledge Extraction 1, 1 (2019), 450–465.
- Ben Shneiderman. 1996. The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings 1996 IEEE Symposium on Visual Languages. IEEE, 336–343.
- Stanford. 2023. The AI index report: Measuring trends in artificial intelligence. https://aiindex.stanford.edu/report/
- LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 667–676.
- Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2022), 1146–1156.
- Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114. arXiv:1905.11946
- TensorFlow. 2018. Introducing the Model Optimization Toolkit for TensorFlow. https://blog.tensorflow.org/2018/09/introducing-model-optimization-toolkit.html
- TensorFlow. 2020. Quantization aware training with TensorFlow Model Optimization Toolkit - performance with accuracy. https://blog.tensorflow.org/2020/04/quantization-aware-training-with-tensorflow-model-optimization-toolkit.html
- David R Thomas. 2003. A general inductive approach for qualitative data analysis. American Journal of Evaluation 27, 2 (2003), 237–246.
- Edward R Tufte. 1986. The visual display of quantitative information. (1986).
- An improved one millisecond mobile backbone. arXiv preprint arXiv:2206.04040 (2022).
- FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization. arXiv preprint arXiv:2303.14189 (2023).
- Machine learning model sizes and the parameter gap. arXiv:2207.02852Â [cs.LG]
- Pete Warden and Daniel Situnayake. 2019. Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers. O’Reilly Media.
- Data and Network Introspection Kit. https://github.com/apple/dnikit
- The what-if tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics 26, 1 (2019), 56–65.
- Visualizing dataflow graphs of deep learning models in TensorFlow. IEEE Transactions on Visualization and Computer Graphics (2018).
- Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. In International Conference on Machine Learning. PMLR, 5363–5372.
- Visualization and pruning of SSD with the base network VGG16. In Proceedings of the 2017 International Conference on Deep Learning Technologies. 90–94.
- VAC-CNN: A visual analytics system for comparative studies of deep convolutional neural networks. IEEE Transactions on Visualization and Computer Graphics 28, 6 (2022), 2326–2337.
- Resource management using machine learning in mobile edge computing: A survey. In 2019 Ninth International Conference on Intelligent Computing and Information Systems. IEEE, 112–117. https://doi.org/10.1109/icicis46948.2019.9014733
- How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–23.
- Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856. https://doi.org/10.1109/cvpr.2018.00716
- A survey of deep learning on mobile devices: Applications, optimizations, challenges, and research opportunities. Proc. IEEE 110, 3 (2022), 334–354. https://doi.org/10.1109/jproc.2022.3153408
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE 107, 8 (2019), 1738–1762. https://doi.org/10.1109/jproc.2019.2918951
- Dynamic resolution network. Advances in Neural Information Processing Systems 34 (2021), 27319–27330. arXiv:2106.02898
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.