A Deep Value-network Based Approach for Multi-Driver Order Dispatching (2106.04493v1)

Published 8 Jun 2021 in cs.LG and cs.AI

Abstract: Recent works on ride-sharing order dispatching have highlighted the importance of taking into account both the spatial and temporal dynamics in the dispatching process for improving the transportation system efficiency. At the same time, deep reinforcement learning has advanced to the point where it achieves superhuman performance in a number of fields. In this work, we propose a deep reinforcement learning based solution for order dispatching and we conduct large scale online A/B tests on DiDi's ride-dispatching platform to show that the proposed method achieves significant improvement on both total driver income and user experience related metrics. In particular, we model the ride dispatching problem as a Semi Markov Decision Process to account for the temporal aspect of the dispatching actions. To improve the stability of the value iteration with nonlinear function approximators like neural networks, we propose Cerebellar Value Networks (CVNet) with a novel distributed state representation layer. We further derive a regularized policy evaluation scheme for CVNet that penalizes large Lipschitz constant of the value network for additional robustness against adversarial perturbation and noises. Finally, we adapt various transfer learning methods to CVNet for increased learning adaptability and efficiency across multiple cities. We conduct extensive offline simulations based on real dispatching data as well as online AB tests through the DiDi's platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods. We finally show that the performance can be further improved through the efficient use of transfer learning.

Citations (165)

View on Semantic Scholar

Summary

The paper introduces a deep reinforcement learning framework using a Semi Markov Decision Process (SMDP) and novel Cerebellar Value Networks (CVNet) to model and solve multi-driver order dispatching.
Online A/B testing demonstrated that the CVNet approach significantly outperformed recent dispatching methods, improving key operational metrics like driver income and user experience.
This framework offers a robust and scalable method for applying deep reinforcement learning to complex, high-dimensional decision-making in large-scale transportation systems.

A Deep Value-network Based Approach for Multi-Driver Order Dispatching

The integration of deep learning techniques into transportation systems has opened up diverse avenues for improving operational efficiencies. This paper presents a deep reinforcement learning framework for the complex task of multi-driver order dispatching, an application within the ride-sharing domain that necessitates careful consideration of both spatial and temporal dynamics to optimize performance. The authors propose a novel application of a Semi Markov Decision Process (SMDP) and introduce the Cerebellar Value Networks (CVNet), which together enhance the adaptability and robustness of existing dispatching methodologies.

Methodology

The novel contribution in this work is the modeling of order dispatching using SMDP, which considers temporal extensions in dispatch actions—a marked departure from the standard Markov Decision Process often employed in similar contexts. The transition rewards and durations, embedded in the SMDP framework, are crucial for capturing the dynamics of driver and passenger interactions over time.

CVNet, the proposed neural network architecture, employs a unique distributed state representation layer derived from cerebellar models. It is aimed at stabilizing the nonlinear function approximators used in reinforcement learning. The CVNet introduces multiple overlapping tilings in the state space, which offer a sparse coarse-coded version of the representation that is both generalizable and robust against adversarial perturbations.

Another key feature of CVNet is the inclusion of a lipschitz regularization scheme that adds robustness to the network by penalizing large Lipschitz constants, thus ensuring the network's output remains bounded relative to changes in its inputs. This regularization helps in maintaining stability during dynamic decision-making processes, especially over diverse spatial domains.

Results

The authors conducted extensive online A/B testing using DiDi's ride-dispatching platform, demonstrating notable improvements in key operational metrics such as total driver income and user experience related metrics. Specifically, CVNet consistently outperformed recent dispatching methods, thus validating its superiority in practical settings.

Moreover, CVNet exhibits enhanced generalization through the effective employment of transfer learning across diverse urban contexts. With its hierarchical representation, it facilitates scalable learning across multiple cities—a significant practical advantage given the geographical diversity of ride-sharing operations.

Implications and Future Directions

This paper's approach has meaningful implications for the application of deep reinforcement learning in large-scale transportation systems. By adopting a robust SMDP model and stabilizing neural network architectures, the method provides an operational framework that is responsive to real-time variations in supply and demand. The cerebellar embedding expansion implemented in CVNet is a potential breakthrough for handling high-dimensional decision landscapes, commonly present in order dispatch scenarios.

Future research may explore extending these methodologies to fleet management and other logistical decision-making arenas, where temporal dynamics play a critical role. Integrating end-to-end learning processes, where planning and evaluation are seamlessly interconnected, could further refine operational efficiencies.

In conclusion, this paper marks a promising trajectory for integrating advanced deep reinforcement learning frameworks into real-world transportation systems, paving the way for intelligent, adaptable, and efficient ride-sharing solutions.