Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems (2408.01188v2)

Published 2 Aug 2024 in cs.AI

Abstract: Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function with predefined weights. A number of Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) and apply it to the Emergent Web Servers exemplar, a self-adaptive server, to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: {\epsilon}-greedy algorithm and Deep Q-Networks. Our initial evaluation shows that DWN optimizes multiple objectives simultaneously with similar results than DQN and {\epsilon}-greedy approaches, having a better performance for some metrics, and avoids issues associated with combining multiple objectives into a single utility function.

Summary

The paper introduces Deep W-Learning (DWN), a multi-objective reinforcement learning framework that extends DQN to optimize multiple objectives in autonomous systems.
It achieves a lower average response time of 0.2362 by independently optimizing response time and configuration cost using separate deep Q-networks.
The results demonstrate superior adaptability in dynamic system reconfiguration compared to traditional single-objective RL methods.

Multi-Objective Deep Reinforcement Learning Optimization in Autonomous Systems

This paper explores the application of Multi-Objective Reinforcement Learning (MORL), specifically using Deep W-Learning (DWN), within Autonomous Systems (AS). The paper focuses on the Emergent Web Server (EWS), which dynamically transitions its configuration to optimize performance metrics.

Introduction and Background

Autonomous Systems (AS) are engineered to operate optimally by continuously adapting to environmental changes. Self-Adaptive Systems (SAS), a subclass of AS, monitor their environment and autonomously adjust their behavior. Reinforcement Learning (RL) algorithms are often employed in such systems due to their ability to learn and adapt at runtime without requiring predefined actions or detailed environmental models.

While traditional RL approaches like Q-learning are effective, they are typically constrained to single-objective optimization. This limitation necessitates combining multiple objectives into a single aggregated reward function, which restricts flexibility and adaptability. This paper addresses this shortcoming by utilizing Deep W-Learning (DWN), a MORL technique extended from tabular W-learning to incorporate neural networks.

Methodology

The paper provides a detailed breakdown of the implemented DWN approach. Deep Q-learning (DQN) forms the baseline single-objective RL algorithm, which is extended to handle multiple objectives in DWN. The DWN framework optimizes each objective separately using individual DQNs and integrates these using W-values, representing the importance of each objective's optimal actions.

Key Components:

Deep Q-learning: Utilizes neural networks to approximate the Q-value for an action-state pair, optimizing long-term rewards.
Deep W-Networks: Adapts DQN for multi-objective optimization by maintaining separate Q-networks for each objective and employing W-learning to choose the best action across these objectives.

Experimental Setting

The Emergent Web Server (EWS) serves as the experimental platform. EWS can dynamically transition between 42 configurations using various components like request handlers, HTTP processing modules, compression algorithms, and cache strategies. The main objective is to minimize two performance metrics simultaneously: average response time (T) and configuration cost (C).

Three algorithms are compared in this paper:

$\epsilon$ -Greedy Algorithm: Serves as the baseline single-objective RL approach.
Deep Q-Network (DQN): A single-objective optimization method extended using deep learning.
Deep W-Learning (DWN): The proposed MORL approach that independently optimizes T and C using separate DQNs and combines these using W-learning.

Results

The performance results underline DWN's ability to handle multiple objectives effectively. Figures and metrics are provided to illustrate the comparative performance. Notably, DWN outperforms both DQN and the $\epsilon$ -greedy algorithm in minimizing average response time, with DWN achieving an average response time of 0.2362 compared to DQN's 0.2480 and $\epsilon$ -greedy's 0.2523.

Key Findings:

Response Time: DWN achieves a lower average response time with higher flexibility compared to DQN and $\epsilon$ -greedy.
Cost: While DWN incurs a slightly higher cost, its ability to optimize multiple objectives simultaneously makes it a robust contender for real-world AS applications where multiple conflicting objectives must be balanced.

Implications and Future Work

The paper's successful implementation of DWN in EWS provides a significant contribution to the field of MORL in SAS. The ability to optimize multiple objectives without pre-aggregated weights enhances the adaptability of such systems in real-world scenarios.

Future Research Directions:

Multi-metric Optimization: Extending the analysis to include additional metrics like resource consumption.
Configuration Mutations: Implementing combinations and mutations of EWS configurations to better optimize the objectives.
Framework Integration: Integrating more complex optimization frameworks like ComInA, which applies to bus routing systems.

Conclusion

By demonstrating the practical applicability of DWN in a real-world autonomous system, this paper advances the MORL field. The methodology and results underline the feasibility and benefits of using DWN for handling multi-objective optimization within dynamic, self-adaptive environments. The insights provided in this paper lay a strong foundation for further research and development in multi-objective optimization techniques within autonomous systems.

Acknowledgment

The authors acknowledge the support from Science Foundation Ireland under Grant number 18/CRT/6223, and for Open Access purposes, a CC BY public copyright license has been applied to the Author Accepted Manuscript version arising from this submission.

PDF Markdown