Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging

Published 14 Apr 2024 in cs.SE and cs.LG | (2404.09151v3)

Abstract: While existing ML frameworks focus on established platforms, like running CUDA on server-grade GPUs, there have been growing demands to enable emerging AI applications in a broader set of scenarios, such as running LLMs within browsers and mobile phones. However, deploying emerging models on new platforms (such as Metal and WebGPU) presents significant software engineering challenges due to rapid model evolution and limited tooling and practices for these platforms. Previous practice for ML model deployment often follows a bottom-up fashion, where engineers first implement individual required operators and then put them together. However, this traditional development approach fails to meet the productivity requirements when deploying emerging ML applications, with the testing and debugging part as a bottleneck. To this end, we introduce \textsc{TapML}, a top-down approach designed to streamline model deployment on diverse platforms. While the traditional bottom-up approach requires crafting manual tests, \textsc{TapML} automatically creates high-quality, realistic test data through operator-wise test carving. Furthermore, \textsc{TapML} uses a migration-based strategy to gradually offload model implementation from the mature source platform to the target platform, minimizing the debugging scope of compound errors. \textsc{TapML} has been used as the default development method in the MLC-LLM project to deploy emerging ML models. Within 2 years, \textsc{TapML} has accelerated the deployment of 105 emerging models in 27 model architectures across 5 emerging platforms. We show that \textsc{TapML} effectively boosts developer productivity while ensuring the quality of deployed models. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.

Abstract PDF HTML Upgrade to Chat

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (7)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (7)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research