Offloading Fine-Grained Execution Units (Threads) Across CPUs and DPUs

Develop a general mechanism to offload fine-grained execution units—specifically threads—across CPUs and DPUs within the HeteroPod architecture and its OS Overlay, enabling cross-PU execution at thread-level granularity while preserving existing cloud-native application semantics and requiring no application modifications.

Background

The paper introduces HeteroPod, which dynamically splits cloud-native applications across CPUs and DPUs to reduce infrastructure overheads. Its foundational OS Overlay provides split network namespaces (hetero-netns) and an efficient cross-PU networking stack (hetero-socket), enabling container-granular offloading without modifying applications.

While the prototype demonstrates container-level offloading for infrastructure and application containers, the authors note that moving to finer-grained units, such as threads, poses additional challenges that remain unresolved. Achieving thread-level offloading would require new mechanisms to maintain correctness, isolation, and performance across heterogeneous processing units within the unified Pod abstraction.

References

Offloading more fine-grained granularity, e.g., threads, is still an open challenge for future work.

HeteroPod: XPU-Accelerated Infrastructure Offloading for Commodity Cloud-Native Applications  (2503.23952 - Yang et al., 31 Mar 2025) in Subsection "Compatibility", Limitations paragraph