Nexus: Transparent I/O Offloading for High-Density Serverless Computing

Published 8 Apr 2026 in cs.DC and cs.OS | (2604.06682v1)

Abstract: Serverless computing relies on extreme multi-tenancy to remain economically viable, driving providers to rely on virtual machines (VMs) that ensure strong isolation and seamless ecosystem compatibility with the FaaS programming model. However, current architectures tightly couple application processing logic with I/O processing, forcing every VM to duplicate a heavy communication fabric (cloud SDK, RPC, and TCP/IP). Our analysis reveals this duplication consumes over 25% of a function's memory footprint, and may double the CPU cycles in VMs compared to bare-metal execution. While prior systems attempt to solve this using WebAssembly or library OSes, they naively sacrifice ecosystem compatibility, forcing developers to migrate code and dependencies to new languages. We introduce Nexus, a serverless-native KVM-based hypervisor that transparently decouples compute from I/O. Nexus shifts the execution model by intercepting communication fabric at the API boundary and offloading it to an always-on host shared backend via zero-copy shared memory. This removes the heavyweight communication fabric from the guest VM, while preserving the conventional serverless programming model. By structurally separating these domains, Nexus unlocks asynchronous I/O optimizations: overlapping input payload prefetching with VM restoration from a snapshot and writing output payloads back to storage off the critical path. Compared to the production baseline, Nexus reduces overall node-level CPU and memory consumption by up to 44% and 31%, respectively, thus increasing deployment density by 37%. Also, Nexus reduces warm- and cold-start latency by 39% and 10%, respectively, bringing the response time within 20% of that of a WASM-based, ecosystem-incompatible hypervisor.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper demonstrates that decoupling I/O from compute reduces CPU cycles by up to 37% and memory overhead by up to 20%, enhancing resource efficiency.
It employs a serverless-native KVM hypervisor with asynchronous I/O optimizations, improving deployment density by 37% and cold-start latency by 10%.
The architecture maintains compatibility with existing cloud SDKs while strengthening isolation and security via hardware-assisted protections like Intel MPK and Arm CHERI.

Nexus: Transparent I/O Offloading for High-Density Serverless Computing

Motivation and Background

The economic viability of serverless platforms depends on their ability to maximize deployment density while ensuring strong isolation and ecosystem compatibility. The widespread adoption of KVM-based microVMs enables robust security and compatibility with existing codebases reliant on high-level runtimes and cloud SDKs. However, this architectural choice introduces substantial overheads: every function instance redundantly loads a full communication fabric—including networking stack, RPC framework, and cloud provider SDKs—resulting in excessive CPU consumption and memory duplication at scale.

Profiling contemporary deployments illustrates that the communication fabric and virtualization stack dominate CPU cycle consumption and memory footprint. On a worker node, guest user space (application logic plus communication fabric) accounts for 74% of CPU cycles, with host and guest kernel space overheads accumulating due to repeated virtualization boundary crossings—a pattern exacerbated by inefficiencies in interpreted runtimes (Python, NodeJS) commonly used for rapid application development.

Figure 1: CPU cycles distribution on a worker node illustrating guest user space as the predominant consumer.

Memory analysis reveals that the communication fabric constitutes over 25% of a function’s memory footprint. This duplication compounds in scenarios where hundreds of VMs are colocated on a node, restricting deployment density and raising operational costs.

Figure 2: Breakdown of memory footprint for each component during function execution, averaged across vSwarm workloads.

Nexus Architecture and Execution Model

Nexus introduces a serverless-native KVM hypervisor that decouples compute and I/O transparently at the API boundary. Function instances interact with a thin provider SDK frontend library inside each VM, which remotes all I/O operations to a shared, trusted backend on the host via zero-copy shared memory. This remoting boundary is at the high-level API of cloud SDKs and function invocation RPCs, preserving full compatibility with standard programming models and POSIX.

Figure 3: Nexus serverless architecture overview showing the separation of frontend (in VM) and backend (host).

This decoupled model enables asynchronous I/O optimizations:

Input payload prefetching is triggered by ingress routing hints, overlapping remote data fetch with VM restoration.
Output writes are safely deferred off the VM's critical path; the backend completes the transmission after function logic concludes, freeing VM resources early.

Function execution lifecycle in Nexus is transformed from a strictly serialized path (restore → fetch → compute → write) to a pipelined, overlapping sequence that accelerates cold and warm invocations.

Figure 4: Function execution lifecycle with RPC management and cloud storage access offloading.

Resource Efficiency and Deployment Density

Nexus yields significant reductions in CPU and memory overhead by extracting the communication stack from the guest environment. End-to-end cluster evaluation indicates Nexus serves up to 440 function instances under strict p99 latency SLOs, reflecting a deployment density improvement of 37% over the AWS Firecracker baseline when using RDMA (TCP yields 18% gain).

Figure 5: End-to-End Latency comparison across Baseline, Nexus-TCP, Nexus-Async, and Nexus.

Warm latency is reduced by up to 39%, with the greatest impact on workloads dominated by I/O (e.g., linear regression serving, stack training reducer), while compute-intensive workloads observe modest improvements.

Figure 6: Warm latency across vSwarm workloads normalized to Baseline; Nexus sharply reduces guest-side I/O.

CPU cycles breakdown demonstrates a 37% average reduction in per-invocation cycles, with guest user cycles dropping 28%; host kernel cycles are further reduced by 54% due to RDMA kernel bypass. KVM activity is also significantly lowered—Nexus decreases KVM exits by 53% and vCPU wakeups by 70%.

Figure 7: CPU cycles breakdown for each workload under the three studied systems, normalized per invocation across execution domains.

Figure 8: kvm exit and kvm vcpu wakeup event rates across vSwarm workloads normalized per invocation.

Memory footprint normalization reveals per-instance reduction of up to 20%. At scale, worker node memory consumption is consistently decreased by 10–21% as Nexus amortizes the shared backend cost across tenants.

Figure 9: Per-function instance memory footprint across vSwarm workloads, normalized to Baseline.

Figure 10: Worker node memory footprint breakdown across deployment densities, showing Nexus’s backend amortization.

Cold-Start Latency and Snapshot Optimization

Nexus’s decoupling reduces snapshot bloat: VM restoration retrieves 31% fewer memory pages, accelerating instance startup. Input prefetching and asynchronous output completion break the serialization bottleneck, yielding a net 10% reduction in cold-start latency.

Figure 11: Normalized cold-start latency breakdown across the vSwarm suite.

Figure 12: Snapshot working set size in pages during restoration, demonstrating Nexus’s reduction in mandatory reads.

Comparative Analysis with Ecosystem-Incompatible Sandboxes

The WASM-based Faasm hypervisor, which foregoes compatibility with Python and standard SDKs, retains a moderate efficiency lead over Nexus (~20–25% lower CPU cycles, 3.5× lower memory). However, ecosystem compatibility and developer productivity remain barriers for WebAssembly and similar lightweight sandboxes.

Figure 13: AES encryption workload: execution time, per-invocation CPU cycles, and memory footprint under Baseline, Nexus, and Faasm.

Security, Isolation, and Practical Implications

Centralizing I/O in the backend introduces an expanded fault domain. Nexus mitigates this via memory-safe backend implementation, isolated shared memory regions (one-to-one tenant mapping), and hardware-assisted protection (Intel MPK, Arm CHERI). Least-privilege credential management eliminates the exposure of provider secrets in guest environments, strengthening the threat model.

Nexus achieves transparency: application code, deployment models, and high-level language runtimes are preserved, and extensions to other environments (NodeJS, Java) require minimal changes (thin SDK interception).

Implications and Future Directions

Nexus’s structural separation of compute and I/O enables high deployment density while retaining compatibility with the mature FaaS ecosystem. Practical implications include reduced resource pressure, improved response times, and operational cost optimization for cloud providers. The architecture is fundamentally compatible and orthogonal with ongoing innovations in serverless caching, storage tiering, and hardware acceleration. Future work may extend asynchronous I/O semantics, optimize backend reliability, and explore further hardware acceleration mechanisms.

Conclusion

Nexus demonstrates that transparent decoupling of I/O from compute in KVM-based serverless platforms eliminates intrinsic overheads of virtualized communication fabric duplication, unlocking substantial resource efficiency gains and increased deployment density. The approach preserves ecosystem compatibility and enables asynchronous optimizations without user code changes. These results signal a clear path toward scalable, efficient, and compatible serverless architectures (2604.06682).

Markdown Report Issue