ElasticVR: Adaptive VR Resource Management
- ElasticVR is an end-to-end framework that integrates scalable 360° video tiling, elastic offloading, and multi-agent reinforcement learning to adapt VR streaming under hard latency constraints.
- It employs both centralized (CPPG) and decentralized (IPPG/CTDE) DRL approaches to jointly optimize quality, response time, and energy usage across heterogeneous networks.
- Performance benchmarks indicate up to +43.21% PSNR gain and -56.83% energy consumption relative to baselines, showcasing its potential for immersive, resource-constrained VR applications.
ElasticVR is an end-to-end framework for adaptive, elastic computation and resource management in multi-user, multi-connectivity wireless virtual reality (VR) systems. The framework integrates scalable 360° video tiling, elastic computational task offloading, and multi-agent deep reinforcement learning to optimize user-perceived quality of experience (QoE) and system efficiency under hard latency constraints. It enables the joint adaptation of both task fidelity and computation offloading strategy across heterogeneous edge-client architectures and volatile wireless networks (Badnava et al., 13 Dec 2025).
1. System Architecture and Scalable Tiling
ElasticVR is designed for scenarios where multiple VR users, each equipped with a head-mounted display (HMD) possessing a lightweight CPU and battery, interact with edge servers through several heterogeneous wireless channels (including 4G, 5G, and WiGig). The architecture supports two modes: local computation (on the HMD) or offloading to a nearby multi-access edge computing (MEC) server with constrained CPU budget (). The throughput of each channel for a user is time-varying (), further averaged to over a task's transmission interval.
Elastic 360° video streaming is achieved by dividing each group-of-pictures (GoP) into an grid of spatial tiles. For each tile , a base-layer size and enhancement-layer sizes are precomputed. The current field of view (FOV), determined by head pose, yields a binary mask (1 for viewport tiles). The elasticity index controls task fidelity, yielding an input size:
with computational intensity scaling linearly as (cycles/bit).
Task execution time and energy depend on whether the task is run locally or offloaded. Local computation time is ; offloaded computation time sums transmission, MEC processing, and result retrieval. The local CPU power is , with offload energy reflecting radio transmission and reception.
2. Joint Optimization of Quality, Latency, and Energy
ElasticVR formalizes the global system objective as a constrained, multi-metric optimization problem, balancing:
- QoE: average viewport PSNR ,
- Response Time: ,
- Energy Consumption: ,
where indexes task elasticity and the offloading link. The scalarized objective is:
subject to per-task deadlines and MEC CPU constraints .
3. Multi-Agent Reinforcement Learning Approaches
ElasticVR frames this problem as a multi-agent Markov decision process. At each GoP, every user selects an action based on local observations. The framework introduces two solution paradigms:
- Centralized Phasic Policy Gradient (CPPG): Implements centralized actor-critic DRL with the full global state and joint action . The reward incorporates the QoE, response time, energy, and deadline violation penalties. The actor's and critic's losses follow proximal policy optimization (PPO) with phasic updates. CPPG exploits all cross-user couplings (e.g., shared CPU), achieving near-optimal coordination but suffers from state/action scaling and resultant training and inference overhead, limiting scalability for large user groups.
- Independent Phasic Policy Gradient (IPPG): Adopts a centralized training, decentralized execution (CTDE) paradigm. Each agent (user) observes only its local state and acts independently at execution, but shares parameters and pooled experience during training. This ensures per-user execution overhead and high sample efficiency via parameter sharing. The local PPO objective maintains global reward coupling, but state observability is partial at runtime. Empirical results indicate IPPG retains near-optimal performance as increases, whereas CPPG throughput degrades for .
4. Training and Execution Paradigms
Two execution models correspond to the solution variants:
- CTCE (CPPG): Both training and inference are conducted at the MEC. Each HMD uploads its local state at every GoP; MEC computes the global joint action. This configuration guarantees maximal coordination, but introduces significant uplink overhead, single-point failure risk, and limited scalability.
- CTDE (IPPG): Centralized training occurs at the MEC with all users' experience pooled for periodic parameter updates. Execution is fully decentralized: each HMD runs its local policy using current parameters and only occasionally communicates with the MEC for retraining. This decentralized design dramatically lowers communication cost and removes inference bottlenecks, though it forgoes explicit cross-user coordination at runtime.
5. Performance Benchmarking and Comparative Evaluation
ElasticVR’s efficacy is validated on public 360° video datasets and real 4G/5G/WiGig traces. Comparative evaluation includes:
- Pareto-optimal brute-force reference
- Neural epsilon-greedy joint learning
- Elasticity-agnostic PPO (EA-Offloader) baseline (, optimizing only offloading decision)
Key results (for , weights , , ):
| Method | PSNR (dB) | RT (s) | Energy (mJ) | Deadline Viol. |
|---|---|---|---|---|
| Optimal | 49.24±1.03 | 0.57±0.07 | 1.80±0.98 | 0% |
| CPPG | 48.35±0.80 | 0.49±0.13 | 1.01±0.74 | 0% |
| IPPG | 48.14±0.57 | 0.53±0.03 | 0.61±0.04 | 0% |
| EA-Offloader | 38.63±16.44 | 0.85±0.35 | 2.34±0.92 | 32.4% |
| ε-Greedy | 33.81±15.68 | 0.85±0.73 | 2.06±2.39 | 40.7% |
ElasticVR (CPPG/IPPG) delivers PSNR gain, response time, and energy consumption relative to the elasticity-agnostic baseline. PSNR–RT–Energy trade-off trajectories (Pareto front) confirm that adaptive elasticity and learned offloading enable performance within 5% of the global optimum. Fixed values result in either poor QoE–energy tradeoffs or massive deadline violations; only adaptive elasticity consistently achieves sub-12% deadline violations while balancing all metrics.
6. Adaptation, Scalability, and Limitations
ElasticVR’s adaptive tiling and multi-agent DRL enable robust performance across a range of network and computational conditions, including dynamic channel variation and variable user counts. The decentralized (IPPG/CTDE) variant ensures practical scalability and resilience against single point failures. A plausible implication is that, for small user groups, centralized CPPG may yield marginally superior coordination, but CTDE/parameter-sharing techniques become essential as network scale grows and environments become more heterogeneous.
A known limitation is the partial observability under decentralized execution, which may yield modest suboptimality in low- deployments. The framework assumes the elasticity index can be adjusted per GoP and that per-tile enhancement layers are precomputed and available.
7. Significance and Application Contexts
ElasticVR addresses the critical challenge of real-time, immersive 360° VR streaming in multi-user, resource-constrained, and bandwidth-variable environments. The integration of elastic 360° tiling with DRL-based offloading mechanisms offers a generalizable approach to maximizing application-level QoE while conforming to stringent latency and energy budgets. This approach is directly applicable to arenas, collaborative VR platforms, and future wireless edge architectures demanding scalable, adaptive resource provisioning and high-fidelity/user-centric experiences. Methods and results have influenced contemporary directions in elastic VR task computing and adaptive streaming research, unifying resource allocation with perceptual optimization (Badnava et al., 13 Dec 2025).