Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing (2312.14472v2)
Abstract: Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy. To enhance data efficiency by sharing parameters across multiple tasks, a common practice segments the network into distinct modules and trains a routing network to recombine these modules into task-specific policies. However, existing routing approaches employ a fixed number of modules for all tasks, neglecting that tasks with varying difficulties commonly require varying amounts of knowledge. This work presents a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of certain intermediate modules, thereby flexibly choosing different numbers of modules for each task. Under this framework, we further introduce a ResRouting method to address the issue of disparate routing paths between behavior and target policies during off-policy training. In addition, we design an automatic route-balancing mechanism to encourage continued routing exploration for unmastered tasks without disturbing the routing of mastered ones. We conduct extensive experiments on various robotics manipulation tasks in the Meta-World benchmark, where D2R achieves state-of-the-art performance with significantly improved learning efficiency.
- Modular Multitask Reinforcement Learning with Policy Sketches. In Proceedings of International Conference on Machine Learning, 166–175.
- Bellman, R. 1966. Dynamic Programming. Science, 153(3731): 34–37.
- Adaptive Neural Networks for Efficient Inference. In Proceedings of International Conference on Machine Learning, 527–536.
- Sparse Multi-Task Reinforcement Learning. In Advances in Neural Information Processing Systems, 819–827.
- Caruana, R. 1997. Multitask Learning. Machine Learning, 28: 41–75.
- Multi-Task Reinforcement Learning with Task Representation Method. In ICLR Workshop on Generalizable Policy Learning in Physical World, 1–11.
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Journal of Machine Learning Research, 23(1): 5232–5270.
- PathNet: Evolution Channels Gradient Descent in Super Neural Networks. arXiv:1701.08734.
- Soft Actor-Critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of International Conference on Machine Learning, 1856–1865.
- Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
- Multi-Task Deep Reinforcement Learning with Popart. In Proceedings of the AAAI Conference on Artificial Intelligence, 3796–3803.
- Adaptive Mixtures of Local Experts. Neural Computation, 3(1): 79–87.
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. In Proceedings of International Conference on Learning Representations, 1–14.
- End-to-end Training of Deep Visuomotor Policies. The Journal of Machine Learning Research, 17(1): 1334–1373.
- Continuous Control with Deep Reinforcement Learning. In Proceedings of International Conference on Learning Representations, 1–10.
- Conflict-Averse Gradient Descent for Multi-Task Learning. In Advances in Neural Information Processing Systems, 18878–18890.
- Modeling Task Relationships in Multi-Task Learning with Multi-Gate Mixture-of-Experts. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1930–1939.
- Human-Level Control through Deep Reinforcement Learning. Nature, 518(7540): 529–533.
- Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. In Proceedings of International Conference on Learning Representations, 1–9.
- Puterman, M. L. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In Proceedings of International Conference on Learning Representations, 1–12.
- Singh, S. P. 1992. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks. Machine Learning, 8: 323–339.
- Multi-Task Reinforcement Learning with Context-based Representations. In Proceedings of International Conference on Machine Learning, 9767–9779.
- PaCo: Parameter-Compositional Multi-Task Reinforcement Learning. In Advances in Neural Information Processing Systems, 21495–21507.
- Reinforcement Learning: An Introduction. MIT press.
- BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. In Proceedings of International Conference on Pattern Recognition, 2464–2469.
- Distral: Robust Multitask Reinforcement Learning. In Advances in Neural Information Processing Systems, 4496–4506.
- Mujoco: A Physics Engine for Model-based Control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026–5033.
- Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(86): 2579–2605.
- Convolutional Networks with Adaptive Inference Graphs. In Proceedings of the European Conference on Computer Vision, 3–18.
- SkipNet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision, 409–424.
- Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control. In Advances in Neural Information Processing Systems, 15146–15155.
- Multi-Task Reinforcement Learning with Soft Modularization. In Advances in Neural Information Processing Systems, 4767–4777.
- Mastering Complex Control in MOBA Games with Deep Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 6672–6679.
- Gradient Surgery for Multi-Task Learning. In Advances in Neural Information Processing Systems, 5824–5836.
- Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning. In Proceedings of the Conference on Robot Learning, 1094–1100.