Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems (2206.03326v2)

Published 6 Jun 2022 in cs.LG and cs.AR

Abstract: Deep Neural Networks (DNNs) have achieved great success in a variety of ML applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However, DNN-based ML applications also bring much increased computational and storage requirements, which are particularly challenging for embedded systems with limited compute/storage resources, tight power budgets, and small form factors. Challenges also come from the diverse application-specific requirements, including real-time responses, high-throughput performance, and reliable inference accuracy. To address these challenges, we introduce a series of effective design methodologies, including efficient ML model designs, customized hardware accelerator designs, and hardware/software co-design strategies to enable efficient ML applications on embedded systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaofan Zhang (79 papers)
  2. Yao Chen (187 papers)
  3. Cong Hao (51 papers)
  4. Sitao Huang (22 papers)
  5. Yuhong Li (33 papers)
  6. Deming Chen (62 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.