Papers
Topics
Authors
Recent
2000 character limit reached

Multi-Commodity 1–1 PDSTSP: Deep Learning & Metaheuristics

Updated 19 December 2025
  • The paper introduces a hybrid method combining Transformer neural network policies with multi-start LNS for optimal revenue in m1-PDSTSP.
  • It details a rigorous mathematical formulation and benchmarks showing sub-second inference and reduced optimality gaps in dynamic freight routing.
  • The approach generalizes selective TSP and PDP variants, enabling efficient and adaptable routing in high-frequency online freight exchange systems.

The multi-commodity one-to-one pickup-and-delivery selective traveling salesperson problem (m1-PDSTSP) is a combinatorial optimization problem central to online freight exchange systems, where the aim is real-time, revenue-maximizing bundling of multi-commodity transportation requests. The problem requires determining a route for a single vehicle subject to resource and precedence constraints, selectively pairing a subset of pickup and delivery nodes to maximize total revenue. The m1-PDSTSP generalizes a range of selective TSP and pickup-and-delivery problem (PDP) variants and represents a challenging instance of constrained vehicle routing under stringent computational latency requirements (Zhang et al., 12 Dec 2025).

1. Mathematical Formulation

The m1-PDSTSP is defined on a complete undirected graph G=(V,E)G = (V, E), with the following elements:

  • Pickup nodes P={1,,n}P = \{1, \dots, n\}, Delivery nodes D={n+1,,2n}D = \{n+1, \dots, 2n\}; each delivery node i+ni + n is paired with pickup ii.
  • Depots: start ($0$) and end ($2n+1$).
  • Requests h{1,,n}h \in \{1, \dots, n\} with demand qh0q_h \geq 0 and revenue rh0r_h \geq 0.
  • Vehicle capacity QQ and maximum route-length TT.
  • Travel costs cijc_{ij} obey the triangle inequality.

Decision variables:

  • Xij{0,1}X_{ij} \in \{0, 1\}: 1 if arc (i,j)(i, j) is used.
  • Ti0T_i \geq 0: cumulative distance upon arrival at node ii.
  • Li0L_i \geq 0: load departing node ii.

Objective: Maximize total revenue from served requests: maxh=1nrh(jVXih,j).\max \sum_{h=1}^n r_h \left(\sum_{j \in V} X_{i_h, j}\right).

Constraints:

  1. No self-loops: i,Xii=0\forall i, X_{ii} = 0.
  2. Single departure/arrival: jPX0j=1\sum_{j \in P} X_{0j} = 1, iDXi,2n+1=1\sum_{i \in D} X_{i,2n+1} = 1.
  3. Flow conservation: iPD,jXij=jXji\forall i \in P \cup D, \sum_j X_{ij} = \sum_j X_{ji}.
  4. At most one visit per node: i2n+1,jXij1\forall i \neq 2n+1, \sum_j X_{ij} \leq 1; j0,iXij1\forall j \neq 0, \sum_i X_{ij} \leq 1.
  5. Pairing (Selectivity): iP,jXij=jXj,i+n\forall i \in P, \sum_j X_{ij} = \sum_j X_{j,i+n}.
  6. Precedence: If jXij=1\sum_j X_{ij}=1, then Ti+nTiT_{i+n} \geq T_i.
  7. Route-length: T0=0T_{0} = 0, T2n+1TT_{2n+1} \leq T, and if Xij=1X_{ij}=1 then TjTi+cijT_j \geq T_i + c_{ij}.
  8. Capacity: L0=0L_0 = 0, 0LiQ0 \leq L_i \leq Q, and if Xij=1X_{ij}=1,

Lj={Li+qjjP, LiqjnjD.L_j = \begin{cases} L_i + q_j & j \in P,\ L_i - q_{j-n} & j \in D. \end{cases}

This formulation enforces feasible vehicle tours that select and pair pickup–delivery requests subject to stringent resource, route, and precedence constraints for maximal realized revenue.

2. Hybrid Algorithmic Pipeline: Deep Learning and Metaheuristics

To address the computational and combinatorial complexity, a hybrid pipeline couples a Transformer Neural Network-based constructive policy with a Multi-Start Large Neighborhood Search (MSLNS) metaheuristic executed within a rolling-horizon framework. This pipeline is architected for sub-second inference on market snapshots, a common requirement in online freight exchanges (Zhang et al., 12 Dec 2025).

A. Transformer-Based Constructive Policy

  • Encoder: Node-wise inputs—geospatial coordinates yiy_i, normalized demand ±q^i\pm \hat{q}_i, normalized revenue r^i\hat{r}_i, and depot-type features—are linearly projected and processed by LL layers of multi-head self-attention, with batch normalization.
  • Decoder (Auto-regressive): The decoder operates over the current partial route, vehicle state (remaining capacity, route-length), context vector, and applies masked attention to produce logits for valid next-node selections.
  • Feasibility Masking: Dynamically excludes visited nodes, deliveries before pickups, infeasible pickups (exceed capacity, route-length bounds).
  • Training: Uses POMO (Policy Optimization with Multiple Optima), employing multi-start policy-gradient REINFORCE with M=n/2M=n/2 distinct start pickups and shared baseline, Adam optimization, and penalty for exceeding route length. No teacher labels are required.
  • Inference: Single greedy rollout achieves O(n2L)O(n^2L) complexity, generating feasible solutions in milliseconds for nn up to 122.

B. Multi-Start Large Neighborhood Search (MSLNS)

  • Initialization: MM diverse seed routes are generated from Transformer-based POMO rollouts.
  • Destroy Operator: Tracks request frequency within a beam of size β\beta, performing softmax-biased sampling of k=2,3,4k=2,3,4 requests (progressively growing) for destruction, avoiding repeated selections via memory U\mathcal{U}.
  • Repair Operator: Greedy insertion reconstructs solutions, optimally reintegrating pickup–delivery pairs while maintaining capacity, route-length, and precedence feasibility.
  • Local Improvement: Applies 2-Opt for further refinement.
  • Beam Update: Combines prior and new candidates, deduplicates by served-set, and retains the top β\beta solutions by revenue.
  • Termination: Iterates until the time budget tmaxt_{\max} is exhausted, returning the route with maximal revenue.

This hybrid approach exploits high-quality, learning-derived seeds to position search close to attraction basins in the solution space, reducing the necessary LNS neighborhood size for optimal improvement.

3. Empirical Performance and Benchmarking

Empirical evaluation spans benchmark problem sizes n{22,42,82,122}n \in \{22, 42, 82, 122\}; each instance features randomized vehicle capacities QUnif[8,20]Q \sim \mathrm{Unif}[8,20] and route-length TT scaled to depot-to-depot baselines. Revenue structures encompass Distance, Ton-Distance, Uniform, and Constant regimes.

Baselines:

  • Heuristic: Greedy Search (GS), Multi-Start Greedy (MSG), Hill-Climbing (HC), 1-destroy Best Improvement LNS (BI-LNS), Adaptive LNS (ALNS), Simulated Annealing (SA).
  • Neural: Attention Model (AM), POMO (single, multi-start, beam search, SGBS).
  • Hybrid: AM+HC, AM+BNS, POMO+HC, POMO+BNS, POMO+MSLNS.
  • Exact: Gurobi (small n=22n=22 only).

Metrics:

  • Average total revenue (higher is better).
  • Optimality gap (best_knownmethod_rev)/best_known×100%(\text{best\_known} - \text{method\_rev})/\text{best\_known} \times 100\%.
  • Winning rate: Proportion of instances achieving best-known solution.
  • Runtime: Sub-second for constructor, seconds–minutes for full pipeline.

Key Observations:

  • POMO greedy alone outperforms classical heuristics on both quality and running time for n42n \geq 42.
  • Augmenting AM/POMO with HC or BI-LNS significantly reduces optimality gaps by 5–10%.
  • POMO+MSLNS [10,5][10,5] achieves <2%<2\% gap and >40%>40\% winning rate for all tested nn within budget.
  • Extended POMO+MSLNS [10,10][10,10] (unlimited runtime) closes gaps to <0.6%<0.6\% and >88%>88\% win rate for n=22n=22, comparable to Gurobi.
  • Under strict sub-second constraints, AM+HC and POMO single-start provide the fastest high-quality solutions.
  • These performance trends persist across diverse revenue settings, with hybrid methods consistently achieving superior results (Zhang et al., 12 Dec 2025).

4. Rolling-Horizon Market Integration

The rolling-horizon framework accommodates dynamic market operation in online freight exchanges:

  • The marketplace state is “frozen” at intervals Δt\Delta t to produce static snapshots.
  • Each snapshot triggers m1-PDSTSP resolution within a sub-second computational budget.
  • Bundles are dispatched to carriers immediately post-solution.
  • The POMO+MSLNS pipeline is deployed independently on these rolling snapshots, ensuring low-latency, robust incremental optimization.

A plausible implication is that such a design supports near-continuous reoptimization in high-frequency environments without excessive computational overhead.

5. Generalizations and Broader Applicability

The m1-PDSTSP generalizes a wide range of selective TSP and pickup-and-delivery variants, as summarized in the original variant taxonomy (reference therein). Its structural and methodological solutions are extensible:

  • The Transformer-based constructive policy and multi-start LNS schema are adaptable to selective routing problems incorporating capacity, precedence, and even time windows.
  • The key insight is that learned seeds from a deep neural network constructor concentrate search in high-value basins, permitting smaller LNS neighborhoods.
  • This approach demonstrates, for the first time, that deep neural network-generated solutions reliably provide effective starting points for improvement metaheuristics across selective pickup-and-delivery problems.

Potential extensions include explicit modeling of time windows, multi-vehicle routing, dynamic reoptimization, and distributed LNS techniques for scaling to very large market snapshots (Zhang et al., 12 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Multi-Commodity One-to-One Pickup-and-Delivery Selective Traveling Salesperson Problem (m1-PDSTSP).