Cooperative Multi-Vehicle DPDP with Stochastic Requests

Updated 28 November 2025

The paper presents a comprehensive MDP formulation that jointly optimizes vehicle routing, scheduling, and assignment in environments with stochastic requests.
It introduces the DRACE+CFA heuristic, achieving up to 78% cost reduction and 84% lateness reduction over myopic baselines.
The approach incorporates heterogeneous fleet dynamics and time-dependent travel, making it applicable to urban parcel, shared-mobility, and on-demand delivery scenarios.

The Cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR) extends classical vehicle routing models to settings where a heterogeneous fleet must serve dynamically arriving pickup and delivery requests, under uncertainty in both demand and system resources (e.g., vehicle availability, travel times). The core objective in MVDPDPSR is to assign, schedule, and route vehicles—potentially of diverse types and with time-varying availability—in order to optimize a cost or service-based objective, while adapting to complex, uncertain spatiotemporal logistics environments. MVDPDPSR arises in a broad array of operational contexts, including urban parcel logistics, shared-mobility, meal delivery, paratransit, and crowdshipping platforms. Recent research elaborates rigorous Markov Decision Process (MDP) models, scalable online and approximation algorithms, and comprehensive computational studies to address the system-level and algorithmic challenges posed by MVDPDPSR (Stoia et al., 14 Aug 2024).

1. Formal Problem Definition and MDP Modeling

MVDPDPSR is formulated as a discrete-time MDP over a time horizon $T$ . At each minute $k$ , the system observes the state

$s_k = (t_k, R_k^p, R_k^o, U_k, M_k)$

where $R_k^p$ and $R_k^o$ are the sets of in-process and outstanding requests, $U_k$ captures newly arrived requests, and $M_k$ parameterizes the active vehicles (including known dedicated vehicles and stochastically appearing crowdshippers). Each request $r$ is defined by pickup $o_r$ , delivery $d_r$ , ready time $e_r$ , and a soft deadline $l_r$ . For every available vehicle $m$ , its route $\theta_m$ , end-of-shift $b_m$ , current forward time $f_m$ , and planned next-departure $w_m$ are tracked.

The action space $X(s_k)$ at epoch $k$ encompasses all feasible nonanticipative updates to route plans and vehicle departure times, requiring completion by $b_m$ for each $m$ . Each action incurs a composite cost: $c(s_k, x) = \text{travel cost} + \text{crowdshipper fees} + \text{lateness penalties}$ accounting for time-dependent travel times $\tau(i,j,t)$ .

State transitions have two components: deterministic updates post-action $(s_k \to s_k^x)$ and stochastic transitions $(s_k^x \to s_{k+1})$ induced by random new requests and crowdshipper appearances.

The objective is to solve the Bellman recursion: $V(s_k) = \min_{x\in X(s_k)} \left\{ c(s_k,x) + \mathbb{E}[V(s_{k+1}) | s_k^x] \right\}$ This MDP algebraically and algorithmically encapsulates the joint optimization of assignment, routing, and waiting, under dynamic and uncertain inputs (Stoia et al., 14 Aug 2024).

2. System Features: Vehicle Heterogeneity and Time-Dependent Travel

MVDPDPSR models vehicle heterogeneity across two main classes:

Dedicated vehicles: Time-scheduled, full-shift availability, depot-origin/destination, fixed capacity, and complete horizon commitment.
Crowdshippers ("gig" vehicles): Random appearance times (often modeled as Poisson processes), stochastic origin-destination and deadline, time-limited system presence, and per-delivery compensation ( $\rho$ ).

Time-dependent travel times are represented as $\tau(i,j,t)$ , a function of (origin, destination, departure time), parameterized by empirical or scenario-specific region-by-timeband speed tensor $v_{r,w}$ . Inter-region and inter-period trips use weighted or piecewise integration schemes to resolve variable speed regimes.

Crucially, modeling both vehicle heterogeneity and realistic temporal-spatial travel is essential for operational validity and policy performance. Incorporating full $\tau(i,j,t)$ rather than averaged travel speed reduces cost by ≈21% and lateness by ≈63% (Stoia et al., 14 Aug 2024).

3. Solution Algorithms: Model-Based Heuristics and Online Approximations

The DRACE (Destroy-and-Repair Accounting for Capacity Expiration) framework provides a practical solution strategy, coupled with a cost function approximation (CFA). At each epoch, DRACE revises partial routes by removing soon-to-expire opportunities, then greedily reinserts requests to vehicles via scoring: $c_{rm} = \mu_1 \Delta_{rm}^{travel} + \mu_2 \Delta_{rm}^{late} + \rho \cdot I_{m \in G} + \lambda(b_m - t_k)$ where $\Delta$ terms quantify incremental travel/lateness, $\lambda$ (tuned offline) penalizes route-end slack, and $\rho$ applies to crowd assignments.

DRACE dynamically reoptimizes insertions over a rolling time window $\Gamma$ , applies strategic waiting using a parameter $\eta$ to choose how long vehicles idle before progressing, and allows both myopic and lookahead policies via the CFA for implicit Bellman value function approximation.

Compared to myopic ALNS baselines, DRACE achieves up to 78% cost reduction and 84% lateness reduction in high demand settings. Strategic waiting with $\eta=0.20$ yields up to 20% additional cost savings (Stoia et al., 14 Aug 2024).

The DRACE+CFA paradigm is extensible to richer stochastic models, crowdshipper types, and alternative lookahead surrogate approximations.

4. Key Computational Findings and System Behavior

Empirical evaluation shows that DRACE exhibits the following properties:

Metric	DRACE (High Demand)	Myopic Baseline
Median Cost Reduction	78%	-
Median Lateness Reduction	84%	-
Crowd-Share (Low Demand)	94%	40%

Bundling requests yields additional cost savings (up to 5.6% in low-demand), though effects attenuate at capacity limits. Full time-dependent travel modeling reduces lateness sharply without substantial increase in routing costs.

Strategic waiting enhances performance in medium/high demand. In scenarios where bundling is feasible (many-to-one or one-to-many), systematic integration into dispatching can slightly reduce aggregate costs, though improvements are demand- and constraint-sensitive.

MVDPDPSR generalizes and subsumes several established dynamic pickup and delivery models:

Stacker Crane/Gated Policies: Embeds the dynamic stacker-crane model, where request arrivals are Poisson, vehicle capacity is unit, and stability is characterized by the load factor

$\rho = \lambda\,[\,\mathbb{E}\|Y-X\| + W(f_P,f_D)\,]/m < 1$

with $W$ the Wasserstein distance between pickup/drop distributions. Stability thresholds precisely quantify required fleet sizing and the effect of spatial imbalance (Treleaven et al., 2012).

Dynamic Vehicle Routing with Stochastic Requests: MVDPDPSR models can also capture real-time acceptance/routing decisions for spatial-temporal, time-windowed requests, and applicability extends to microtransit, paratransit, and delivery-on-demand systems (Wilbur et al., 2022, Zhang et al., 2022, Sawadsitang et al., 2019).

6. Algorithmic Extensions and Adaptations

The MVDPDPSR formulation supports algorithmic augmentation for operational realism and scalability:

Reinforcement Learning and Deep Graph Methods: Multiagent Deep RL (e.g., ST-DDGN) and Transformer-based architectures (e.g., MAPT) can encode value functions or policies for large-scale, real-time decision support. Graph relational representations and spatial-temporal predictors enable scalable, adaptive, and cooperative fleet behavior, with empirical reductions in total cost and vehicle usage relative to production heuristics (Li et al., 2021, Zou et al., 21 Nov 2025).
Recomputation and Reoptimization: Periodic re-routing (Re-route PDPSD) further improves cost and service rates under real-time arrivals, with each re-optimization formulated as a mixed-integer linear program conditioned on current vehicle locations and served customer sets (Sawadsitang et al., 2019).

These algorithmic frameworks are extensible to a range of modifications including general crowdshipper models, stochastic ready times, time-dependent or stochastic traffic, and richer cost/pricing formulations.

7. Ancillary Formulas and Practical Recommendations

Key operational formulas derived and applied in recent research include:

Insertion cost via DELTA $(r, \theta)$ : Estimation via cheapest-insertion or full route recomputation.
Excess lateness: $L_r^+ = \max\{0, \text{arrival}_r - l_r\}$ .
Bellman surrogate for lookahead:

$\bar c_k(s_k,x|\lambda) = \sum_{m \in M_k} \left[ \mu_1 \Delta_m^{travel,x} + \mu_2 \Delta_m^{late,x} + \rho q_m^x I_{m\in G} + \lambda(b_m - t_k) q_m^x \right]$

Parameter selection: Empirically robust choices are $\lambda \approx 0.05$ , $\eta \approx 0.20$ , and $\Gamma$ set so that each minute-epoch completes in under 1 min CPU.

The centralized MVDPDPSR modeling and corresponding DRACE+CFA solution paradigm is robustly extensible to advanced stochastic vehicle routing applications, supporting both theoretical analysis and industry-scale deployment (Stoia et al., 14 Aug 2024).

References:

Dynamic Pickup-and-Delivery for Collaborative Platforms with Time-Dependent Travel and Crowdshipping (Stoia et al., 14 Aug 2024) Asymptotically Optimal Algorithms for Pickup and Delivery Problems with Application to Large-Scale Transportation Systems (Treleaven et al., 2012) An Online Approach to Solve the Dynamic Vehicle Routing Problem with Stochastic Trip Requests for Paratransit Services (Wilbur et al., 2022) Solving Large-Scale Dynamic Vehicle Routing Problems with Stochastic Requests (Zhang et al., 2022) Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems (Li et al., 2021) Re-route Package Pickup and Delivery Planning with Random Demands (Sawadsitang et al., 2019) Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems (Zou et al., 21 Nov 2025)

PDF Markdown Chat (Pro)

References (7)

Dynamic Pickup-and-Delivery for Collaborative Platforms with Time-Dependent Travel and Crowdshipping (2024)

Asymptotically Optimal Algorithms for Pickup and Delivery Problems with Application to Large-Scale Transportation Systems (2012)

An Online Approach to Solve the Dynamic Vehicle Routing Problem with Stochastic Trip Requests for Paratransit Services (2022)

Solving Large-Scale Dynamic Vehicle Routing Problems with Stochastic Requests (2022)

Re-route Package Pickup and Delivery Planning with Random Demands (2019)

Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems (2021)

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR).

Cooperative Multi-Vehicle DPDP with Stochastic Requests

1. Formal Problem Definition and MDP Modeling

2. System Features: Vehicle Heterogeneity and Time-Dependent Travel

3. Solution Algorithms: Model-Based Heuristics and Online Approximations

4. Key Computational Findings and System Behavior

5. Related Models, Stability, and Analytical Results

6. Algorithmic Extensions and Adaptations

7. Ancillary Formulas and Practical Recommendations

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics