Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units (2506.11446v1)

Published 13 Jun 2025 in cs.AR and cs.DC

Abstract: With the rapid development of AI applications, an emerging class of AI accelerators, termed Inter-core Connected Neural Processing Units (NPU), has been adopted in both cloud and edge computing environments, like Graphcore IPU, Tenstorrent, etc. Despite their innovative design, these NPUs often demand substantial hardware resources, leading to suboptimal resource utilization due to the imbalance of hardware requirements across various tasks. To address this issue, prior research has explored virtualization techniques for monolithic NPUs, but has neglected inter-core connected NPUs with the hardware topology. This paper introduces vNPU, the first comprehensive virtualization design for inter-core connected NPUs, integrating three novel techniques: (1) NPU route virtualization, which redirects instruction and data flow from virtual NPU cores to physical ones, creating a virtual topology; (2) NPU memory virtualization, designed to minimize translation stalls for SRAM-centric and NoC-equipped NPU cores, thereby maximizing the memory bandwidth; and (3) Best-effort topology mapping, which determines the optimal mapping from all candidate virtual topologies, balancing resource utilization with end-to-end performance. We have developed a prototype of vNPU on both an FPGA platform (Chipyard+FireSim) and a simulator (DCRA). Evaluation results indicate that, compared to other virtualization approaches such as unified virtual memory and MIG, vNPU achieves up to a 2x performance improvement across various ML models, with only 2% hardware cost.

Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units

The paper "Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units" introduces vNPU, a novel framework designed to address the challenges of virtualization in the context of inter-core connected Neural Processing Units (NPUs). This research tackles the unique problems posed by the hardware topology inherent in NPUs, which distinguishes them from traditional GPU and monolithic NPU architectures.

Key Contributions

vNPU introduces three principal techniques to enhance resource utilization and performance in NPUs:

  1. NPU Route Virtualization: This method enables effective instruction and data routing from virtual NPU cores to physical ones, creating a virtual topology that respects the inter-core connections fundamental to these NPUs. This virtualization is vital for optimizing data flow in AI tasks that benefit from spatial distribution of computational workload.
  2. NPU Memory Virtualization: The paper proposes a translation mechanism tailored for the SRAM-centric design typical of NPUs, which lacks the cache coherence of CPUs and GPUs. This approach minimizes translation stalls and leverages the burst nature of memory access patterns in NPUs, thereby maximizing memory bandwidth and efficiency.
  3. Best-Effort Topology Mapping: This technique improves resource utilization by allowing flexible mapping of virtual NPU topologies to physical hardware. It balances the need for strong isolation with the practical requirement of optimizing performance across different virtual NPU configurations.

Numerical Results and Evaluation

The evaluation conducted on both FPGA platforms and simulators demonstrates that the vNPU framework achieves substantial performance gains over traditional virtualization techniques. Specifically, the paper reports performance improvements up to 1.92x for Transformer models and 1.28x for ResNet models compared to existing MIG-based virtualization methods. Furthermore, the overhead for implementing the virtualization on hardware is minimal, resulting in less than 1% reduction in end-to-end performance.

Implications for AI and Computing

The implications of this research are significant for the development and deployment of AI applications, particularly in cloud and edge computing environments where resource optimization is critical. The ability to dynamically allocate and manage NPUs using virtual topologies could lead to more efficient utilization of AI hardware, enabling faster and more responsive computational capabilities. Moreover, the framework sets a precedent for how future NPUs could be developed and virtualized, enhancing their scalability and flexibility.

Speculations for Future Developments

The introduction of topology-aware virtualization opens up several avenues for further research and development in AI and computing:

  • Enhanced AI Model Support: Future developments could focus on integrating vNPU with emerging AI models that demand distinct computational characteristics, providing tailored virtualization strategies for novel architectures like neuromorphic chips.
  • Cross-Platform Virtualization: Extending the principles of vNPU virtualization to other hardware platforms, such as GPUs or custom accelerators, might offer broader applications in diverse computing scenarios.
  • Refinements in Security and Isolation: As virtualization technology evolves, enhancing the security features of virtual NPUs and ensuring robust isolation between multiple tenants will be essential for widespread cloud deployment.

The vNPU framework exemplifies the innovative adaptability needed in AI hardware design, emphasizing a shift towards more intelligent and resource-aware virtualization strategies in the burgeoning field of neural processing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Dahu Feng (2 papers)
  2. Erhu Feng (4 papers)
  3. Dong Du (19 papers)
  4. Pinjie Xu (2 papers)
  5. Yubin Xia (14 papers)
  6. Haibo Chen (93 papers)
  7. Rong Zhao (43 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com