Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning (2403.00282v2)
Abstract: In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. To address these considerations, we propose a constrained multi-objective RL algorithm named Constrained Multi-Objective Gradient Aggregator (CoMOGA). In the field of multi-objective optimization, managing conflicts between the gradients of the multiple objectives is crucial to prevent policies from converging to local optima. It is also essential to efficiently handle safety constraints for stable training and constraint satisfaction. We address these challenges straightforwardly by treating the maximization of multiple objectives as a constrained optimization problem (COP), where the constraints are defined to improve the original objectives. Existing safety constraints are then integrated into the COP, and the policy is updated using a linear approximation, which ensures the avoidance of gradient conflicts. Despite its simplicity, CoMOGA guarantees optimal convergence in tabular settings. Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks.
- A distributional view on multi-objective policy optimization. In Proceedings of International Conference on Machine Learning, pp. 11–22, 2020.
- Constrained policy optimization. In Proceedings of International Conference on Machine Learning, pp. 22–31, 2017.
- MO-Gym: A library of multi-objective reinforcement learning environments. In Proceedings of Benelux Conference on Artificial Intelligence BNAIC/Benelearn, 2022.
- Sample-efficient multi-objective learning via generalized policy improvement prioritization. In Proceedings of International Conference on Autonomous Agents and Multiagent Systems, pp. 2003–2012, 2023.
- Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 3682–3689, 2022.
- PD-MORL: Preference-driven multi-objective reinforcement learning algorithm. In Proceedings of International Conference on Learning Representations, 2023.
- Convex optimization. Cambridge university press, 2004.
- Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Applied Intelligence, 50:3301–3317, 2020.
- Distributional reinforcement learning with quantile regression. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018.
- A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In Parallel Problem Solving from Nature PPSN VI, pp. 849–858, 2000.
- Convergence and sample complexity of natural policy gradient primal-dual methods for constrained mdps. arXiv preprint arXiv:2206.02346, 2022.
- Désidéri, J.-A. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. Comptes Rendus Mathematique, 350(5):313–318, 2012.
- A toolkit for reliable benchmarking and research in multi-objective reinforcement learning. In Proceedings of Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
- Addressing function approximation error in actor-critic methods. In Proceedings of International Conference on Machine Learning, pp. 1587–1596, 2018.
- A numerically stable dual method for solving strictly convex quadratic programs. Mathematical programming, 27(1):1–33, 1983.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of International Conference on Machine Learning, pp. 1861–1870, 2018.
- A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022.
- A constrained multi-objective reinforcement learning framework. In Proceedings of Conference on Robot Learning, pp. 883–893, 2022.
- Safety-gymnasium. GitHub repository, 2023.
- Efficient off-policy safe reinforcement learning using trust region conditional value at risk. IEEE Robotics and Automation Letters, 7(3):7644–7651, 2022.
- Trust region-based safe distributional reinforcement learning for multiple constraints. In Advances in Neural Information Processing Systems, 2023.
- Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In Proceedings International Conference on Machine Learning, pp. 5556–5566, 2020.
- Pareto policy adaptation. In Proceedings of International Conference on Learning Representations, 2022.
- COptiDICE: Offline constrained reinforcement learning via stationary distribution correction estimation. In Proceedings of International Conference on Learning Representations, 2022.
- Conflict-averse gradient descent for multi-task learning. In Advances in Neural Information Processing Systems, pp. 18878–18890, 2021.
- Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality. In Proceedings of International Conference on Learning Representations, 2023.
- Miettinen, K. Nonlinear multiobjective optimization, volume 12. Springer Science & Business Media, 1999.
- Multi-task learning as a bargaining game. In Proceedings of International Conference on Machine Learning, pp. 16428–16446, 2022.
- A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67–113, 2013.
- Trust region policy optimization. In Proceedings of International Conference on Machine Learning, pp. 1889–1897, 2015.
- Responsive safety in reinforcement learning by PID lagrangian methods. In Proceedings of International Conference on Machine Learning, pp. 9133–9143, 2020.
- Scalarized multi-objective reinforcement learning: Novel design techniques. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191–199, 2013.
- Prediction-guided multi-objective reinforcement learning for continuous robot control. In Proceedings of International Conference on Machine Learning, pp. 10607–10616, 2020.
- CRPO: A new approach for safe reinforcement learning with convergence guarantee. In Proceedings of International Conference on Machine Learning, pp. 11480–11491, 2021.
- A generalized algorithm for multi-objective reinforcement learning and policy adaptation. In Advances in Neural Information Processing Systems, 2019.
- Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, 3(4):257–271, 1999.