Adjustable Robust Reinforcement Learning for Online 3D Bin Packing (2310.04323v1)
Abstract: Designing effective policies for the online 3D bin packing problem (3D-BPP) has been a long-standing challenge, primarily due to the unpredictable nature of incoming box sequences and stringent physical constraints. While current deep reinforcement learning (DRL) methods for online 3D-BPP have shown promising results in optimizing average performance over an underlying box sequence distribution, they often fail in real-world settings where some worst-case scenarios can materialize. Standard robust DRL algorithms tend to overly prioritize optimizing the worst-case performance at the expense of performance under normal problem instance distribution. To address these issues, we first introduce a permutation-based attacker to investigate the practical robustness of both DRL-based and heuristic methods proposed for solving online 3D-BPP. Then, we propose an adjustable robust reinforcement learning (AR2L) framework that allows efficient adjustment of robustness weights to achieve the desired balance of the policy's performance in average and worst-case environments. Specifically, we formulate the objective function as a weighted sum of expected and worst-case returns, and derive the lower performance bound by relating to the return under a mixture dynamics. To realize this lower bound, we adopt an iterative procedure that searches for the associated mixture dynamics and improves the corresponding policy. We integrate this procedure into two popular robust adversarial algorithms to develop the exact and approximate AR2L algorithms. Experiments demonstrate that AR2L is versatile in the sense that it improves policy robustness while maintaining an acceptable level of performance for the nominal case.
- Constrained policy optimization. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 22–31. PMLR.
- Whatever does not kill deep reinforcement learning, makes it stronger. arXiv preprint arXiv:1712.09344.
- Chazelle, B. (1983). The bottomn-left bin-packing heuristic: An efficient implementation. IEEE Transactions on Computers, 32(08):697–707.
- Simulation-guided beam search for neural combinatorial optimization. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems.
- Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070–6120.
- Extreme point-based heuristics for three-dimensional bin packing. Informs Journal on computing, 20(3):368–384.
- A greedy search for the three-dimensional bin packing problem: the packing static stability case. International Transactions in Operational Research, 10(2):141–153.
- Three-dimensional bin packing and mixed-case palletization. INFORMS Journal on Optimization, 1(4):323–352.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144.
- The pallet loading problem: Three-dimensional bin packing with practical constraints. European Journal of Operational Research, 287(3):1062–1074.
- An online packing heuristic for the three-dimensional container loading problem in dynamic environments and the physical internet. In Applications of Evolutionary Computation: 20th European Conference, EvoApplications 2017, Amsterdam, The Netherlands, April 19-21, 2017, Proceedings, Part II 20, pages 140–155. Springer.
- Learning robust options by conditional value at risk optimization. Advances in Neural Information Processing Systems, 32.
- Robust phi-divergence mdps. arXiv preprint arXiv:2205.14202.
- Solving a new 3d bin packing problem with deep reinforcement learning method. arXiv preprint arXiv:1708.05930.
- Tap-net: transport-and-pack using reinforcement learning. ACM Transactions on Graphics (TOG), 39(6):1–15.
- Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284.
- Iyengar, G. N. (2005). Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280.
- Monotonic robust policy optimization with model discrepancy. In International Conference on Machine Learning, pages 4951–4960. PMLR.
- A hybrid genetic algorithm for packing in 3d with deepest bottom left with fill method. In Advances in Information Systems: Third International Conference, ADVIS 2004, Izmir, Turkey, October 20-22, 2004. Proceedings 3, pages 441–450. Springer.
- Sym-NCO: Leveraging symmetricity for neural combinatorial optimization. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems.
- A new dog learns old tricks: Rl finds classic optimization algorithms. In International conference on learning representations.
- Delving into adversarial attacks on deep policies. arXiv preprint arXiv:1705.06452.
- Learning robust policy against disturbance in transition dynamics via state-conservative policy optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7247–7254.
- Policy smoothing for provably robust reinforcement learning. arXiv preprint arXiv:2106.11420.
- A hybrid differential evolution algorithm for multiple container loading problem with heterogeneous containers. Computers & Industrial Engineering, 90:305–313.
- Efficient adversarial training without attacking: Worst-case-aware robust reinforcement learning. arXiv preprint arXiv:2210.05927.
- Roco: A general framework for evaluating robustness of combinatorial optimization solvers on graphs. In The Eleventh International Conference on Learning Representations.
- The three-dimensional bin packing problem. Operations research, 48(2):256–267.
- Robust deep reinforcement learning through adversarial loss. Advances in Neural Information Processing Systems, 34:26156–26167.
- Robust reinforcement learning using offline data. arXiv preprint arXiv:2208.05129.
- A maximal-space algorithm for the container loading problem. INFORMS Journal on Computing, 20(3):412–422.
- Automatic differentiation in pytorch.
- Robust deep reinforcement learning with adversarial attacks. arXiv preprint arXiv:1712.03632.
- Robust adversarial reinforcement learning. In International Conference on Machine Learning, pages 2817–2826. PMLR.
- DIMES: A differentiable meta solver for combinatorial optimization problems. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Seiden, S. S. (2002). On the online bin packing problem. Journal of the ACM (JACM), 49(5):640–671.
- Shapiro, A. (2017). Distributionally robust stochastic programming. SIAM Journal on Optimization, 27(4):2258–2275.
- Deep reinforcement learning with robust and smooth policy. In International Conference on Machine Learning, pages 8707–8718. PMLR.
- Towards online 3d bin packing: Learning synergies between packing and unpacking via drl. In Conference on Robot Learning, pages 1136–1145. PMLR.
- Who is the strongest enemy? towards optimal and efficient evasion attacks in deep rl. arXiv preprint arXiv:2106.05087.
- Worst cases policy gradients. arXiv preprint arXiv:1911.03618.
- Action robust reinforcement learning and applications in continuous control. In International Conference on Machine Learning, pages 6215–6224. PMLR.
- Attention is all you need. Advances in neural information processing systems, 30.
- Graph attention networks. arXiv preprint arXiv:1710.10903.
- A generalized reinforcement learning algorithm for online 3d bin-packing. arXiv preprint arXiv:2007.00463.
- Stable bin packing of non-convex 3d objects with a robot manipulator. In 2019 International Conference on Robotics and Automation (ICRA), pages 8698–8704. IEEE.
- Robot packing with known items and nondeterministic arrival order. IEEE Transactions on Automation Science and Engineering, 18(4):1901–1915.
- Online robust reinforcement learning with model uncertainty. Advances in Neural Information Processing Systems, 34:7193–7206.
- Policy gradient method for robust reinforcement learning. In International Conference on Machine Learning, pages 23484–23526. PMLR.
- Tacc: A full-stack cloud computing infrastructure for machine learning tasks. arXiv preprint arXiv:2110.01556.
- Packerbot: Variable-sized product packing with heuristic deep reinforcement learning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5002–5008. IEEE.
- Towards safe reinforcement learning via constraining conditional value-at-risk. arXiv preprint arXiv:2206.04436.
- Towards solving industrial sequential decision-making tasks under near-predictable dynamics via reinforcement learning: an implicit corrective value estimation approach.
- Robust reinforcement learning on state observations with learned optimal adversary. arXiv preprint arXiv:2101.08452.
- Robust deep reinforcement learning against adversarial perturbations on state observations. Advances in Neural Information Processing Systems, 33:21024–21037.
- Online 3d bin packing with constrained deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 741–749.
- Learning efficient online 3d bin packing on packing configuration trees. In International Conference on Learning Representations.
- Learning practically feasible policies for online 3d bin packing. Science China Information Sciences, 65(1):112105.