Deep Reinforcement Learning for Multi-objective Optimization (1906.02386v2)

Published 6 Jun 2019 in cs.NE and math.OC

Abstract: This study proposes an end-to-end framework for solving multi-objective optimization problems (MOPs) using Deep Reinforcement Learning (DRL), that we call DRL-MOA. The idea of decomposition is adopted to decompose the MOP into a set of scalar optimization subproblems. Then each subproblem is modelled as a neural network. Model parameters of all the subproblems are optimized collaboratively according to a neighborhood-based parameter-transfer strategy and the DRL training algorithm. Pareto optimal solutions can be directly obtained through the trained neural network models. In specific, the multi-objective travelling salesman problem (MOTSP) is solved in this work using the DRL-MOA method by modelling the subproblem as a Pointer Network. Extensive experiments have been conducted to study the DRL-MOA and various benchmark methods are compared with it. It is found that, once the trained model is available, it can scale to newly encountered problems with no need of re-training the model. The solutions can be directly obtained by a simple forward calculation of the neural network; thereby, no iteration is required and the MOP can be always solved in a reasonable time. The proposed method provides a new way of solving the MOP by means of DRL. It has shown a set of new characteristics, e.g., strong generalization ability and fast solving speed in comparison with the existing methods for multi-objective optimizations. Experimental results show the effectiveness and competitiveness of the proposed method in terms of model performance and running time.

PDF Abstract

Deep Reinforcement Learning for Multi-objective Optimization

The paper "Deep Reinforcement Learning for Multi-objective Optimization" by Kaiwen Li, Tao Zhang, and Rui Wang introduces a novel application of Deep Reinforcement Learning (DRL) to tackle Multi-objective Optimization Problems (MOPs). The authors develop an end-to-end framework, termed DRL-MOA, which leverages DRL techniques to solve MOPs efficiently and effectively.

Summary and Analysis

The authors start by addressing the challenges inherent in MOPs, where multiple competing objectives need to be simultaneously optimized. Traditionally, approaches like Multi-objective Evolutionary Algorithms (MOEAs) have been employed, providing approximate solutions through iterative population-based methods. While effective, these methods often require extensive computational resources and exhibit scalability issues, particularly as the dimensionality of the problem increases.

The DRL-MOA framework draws inspiration from recent advances in applying DRL to combinatorial optimization, notably in tasks like the traveling salesman problem (TSP). The core idea is to decompose the MOP into a series of scalar optimization subproblems, each modeled as a neural network structure. This approach utilizes a weighted sum method for decomposition, facilitating the handling of each subproblem in isolation while sharing information across the problem space through a neighborhood-based parameter-transfer strategy. This strategic parameter sharing among subproblems is crucial, as it allows for the transfer of learned behaviors across similar problem instances, thereby enhancing efficiency and adaptability.

The authors exemplify their methodology through the application of DRL-MOA to the multi-objective traveling salesman problem (MOTSP). The neural network model used is a modification of the Pointer Network, which is well-suited for routing and order problems due to its sequence-to-sequence prediction capabilities. Training of these networks is conducted using the Actor-Critic DRL algorithm, allowing the model to learn optimal city tours without requiring explicit supervision on ideal tours.

Experimental Results

The efficacy of DRL-MOA is demonstrated through comprehensive experiments on both Euclidean and Mixed type MOTSP instances, varying from 40 to 200 cities. The results indicate that DRL-MOA exhibits both high solution quality and lower computational times compared to classical MOEAs like NSGA-II and MOEA/D. The framework's performance remains robust across larger problem instances, showcasing its adaptability and generalization capabilities–key advantages over traditional evolutionary methods which often necessitate recalibration or retraining in new problem contexts.

Moreover, the authors explore the impact of applying basic local search techniques, revealing that even simple augmentations can further improve the quality of the solutions. This insight opens avenues for integrating DRL-MOA with more sophisticated heuristic methods to enhance solution refinement.

Implications and Future Work

The DRL-MOA framework presents a substantial forward step in leveraging DRL within the domain of multi-objective optimization. Its speed and adaptability address longstanding challenges in the field, particularly concerning the efficient scaling of algorithms to handle complex, large-scale problems. The strong generalization ability of the framework indicates significant potential for its application across varying MOP types beyond MOTSP.

Future research could explore the integration of advanced network architectures, such as Transformers, to improve the expressiveness and learning capacity of the models. Additional exploration into more granular decomposition methods and leveraging emergent learning paradigms could further refine solution quality and speed. The potential for DRL-MOA to be adapted to continuous and dynamic optimization problems also represents a fertile area for investigation, expanding its utility in real-time decision-making scenarios.

In conclusion, DRL-MOA harnesses deep reinforcement learning to offer a compelling alternative for solving multi-objective optimizations, setting the stage for future research to build on this versatile and efficient approach.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Kaiwen Li (15 papers)
Tao Zhang (481 papers)
Rui Wang (996 papers)

Citations (217)

View on Semantic Scholar

Deep Reinforcement Learning for Multi-objective Optimization (1906.02386v2)