NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks

Published 13 Apr 2026 in cs.DC and cs.AI | (2604.11017v1)

Abstract: Cloud native architecture is about building and running scalable microservice applications to take full advantage of the cloud environments. Managed Kubernetes is the powerhouse orchestrating cloud native applications with elastic scaling. However, traditional Kubernetes autoscalers are reactive, meaning the scaling controllers adjust resources only after they detect demand within the cluster and do not incorporate any predictive measures. This can lead to either over-provisioning and increased costs or under-provisioning and performance degradation. We propose NimbusGuard, an open-source, Kubernetes-based autoscaling system that leverages a deep reinforcement learning agent to provide proactive autoscaling. The agents perception is augmented by a Long Short-Term Memory model that forecasts future workload patterns. The evaluations were conducted by comparing NimbusGuard against the built-in scaling controllers, such as Horizontal Pod Autoscaler, and the event-driven autoscaler KEDA. The experimental results demonstrate how NimbusGuard's proactive framework translates into superior performance and cost efficiency compared to existing reactive methods.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces NimbusGuard, a proactive autoscaler that integrates LSTM forecasting with a deep Q-network to anticipate Kubernetes workload changes and reduce SLA violations.
It employs a hybrid methodology that merges real-time telemetry with predicted workload trends to optimize resource allocation and enhance cost efficiency.
Experimental results show NimbusGuard outperforms traditional autoscalers like HPA and KEDA by reducing response latency and lowering overall resource costs.

NimbusGuard: Proactive Kubernetes Autoscaling via Deep Q-Networks

Introduction

NimbusGuard addresses the inherent limitations of traditional Kubernetes autoscalers by introducing a proactive, DRL-based framework for autoscaling in cloud-native environments. Unlike the prevalent reactive scaling mechanisms, which adjust resources after observing demand changes, NimbusGuard exploits a DQN agent enhanced by LSTM-driven workload forecasting, enabling more responsive and cost-efficient autoscaling.

Limitations of Reactive Autoscaling and Motivation

Reactive autoscalers such as the Kubernetes Horizontal Pod Autoscaler (HPA) and KEDA suffer from lag due to their reliance on current or past metrics for resource adjustment. This typically leads either to resource over-provisioning or to service degradation under bursty workloads. The lack of predictive capability confines their utility, especially for latency-sensitive or highly dynamic microservices.

NimbusGuard Framework

NimbusGuard integrates three principal modules:

Workload Forecasting: A univariate LSTM model processes historical workload traces, generating forecasts of near-future resource demand.
Deep Reinforcement Learning Agent: A DQN, ingesting both real-time cluster telemetry and LSTM forecasts, generates scaling policies that anticipate workload fluctuations and preemptively allocate resources.
Kubernetes Integration Layer: Scaling recommendations are mapped to pod manipulation operations through the Kubernetes API.

This proactive paradigm distinguishes itself by employing the LSTM as a perceptual augmentation to the DRL agent, enabling the agent to learn policy functions that incorporate predicted state transitions.

Experimental Design and Results

NimbusGuard was benchmarked against HPA and KEDA under synthetic and real-world workload traces. The evaluation framework focused on three key performance indicators:

SLA Violation Rate: NimbusGuard consistently demonstrated a reduction in SLA violations relative to both comparison baselines.
Scaling Response Latency: The framework exhibited faster and more stable response times to both workload surges and declines, curtailing periods of under-provisioning.
Resource Cost Efficiency: By preemptively matching supply to anticipated demand, NimbusGuard achieved lower average resource costs over extended observation windows.

The paper provides quantitative evidence of these improvements, consistently outperforming both HPA and event-driven approaches in terms of both service continuity and economic efficiency.

Prior work has explored ML-based autoscaling, including Bi-LSTM-based approaches for scaling prediction [cham2] and Q-learning-based methods for workflow autoscaling [cham4]. However, these often lack closed integration with Kubernetes or fail to operationalize accurate workload forecasts in conjunction with DRL policy learning. Recent DRL-enhanced Kubernetes schedulers exist [cham5], but NimbusGuard’s explicit perception fusion with LSTM forecasts yields improved anticipation and control granularity.

The hybridization of LSTM prediction with DQN-based policy learning distinguishes NimbusGuard from purely model-free or stateless methods, leading to more robust generalization under non-stationary workload distributions.

Implications and Future Work

NimbusGuard’s results substantiate the practical value of proactive RL-based autoscaling for complex microservice deployments. Its open-source implementation and modular architecture facilitate further adaptation to heterogeneous cloud settings, including integration of multi-variate predictors, support for multi-objective optimization, and extension to GPU- or memory-aware autoscaling [cham7].

Future research may explore multi-agent RL coordination for autoscaling across hierarchical service dependencies, meta-learning approaches for rapid adaptation to workload regime shifts, and formal verification of learned policies.

Conclusion

NimbusGuard demonstrates that LSTM-augmented deep RL can substantially advance the state of autoscaling in Kubernetes, delivering statistically significant improvements in both service quality and cost control compared to established autoscalers. The framework’s design generalizes to evolving cloud-native workloads and motivates broader adoption of model-based RL in resource orchestration domains.

Markdown Report Issue