Multi-Commodity 1–1 PDSTSP: Deep Learning & Metaheuristics

Updated 19 December 2025

The paper introduces a hybrid method combining Transformer neural network policies with multi-start LNS for optimal revenue in m1-PDSTSP.
It details a rigorous mathematical formulation and benchmarks showing sub-second inference and reduced optimality gaps in dynamic freight routing.
The approach generalizes selective TSP and PDP variants, enabling efficient and adaptable routing in high-frequency online freight exchange systems.

The multi-commodity one-to-one pickup-and-delivery selective traveling salesperson problem (m1-PDSTSP) is a combinatorial optimization problem central to online freight exchange systems, where the aim is real-time, revenue-maximizing bundling of multi-commodity transportation requests. The problem requires determining a route for a single vehicle subject to resource and precedence constraints, selectively pairing a subset of pickup and delivery nodes to maximize total revenue. The m1-PDSTSP generalizes a range of selective TSP and pickup-and-delivery problem (PDP) variants and represents a challenging instance of constrained vehicle routing under stringent computational latency requirements (Zhang et al., 12 Dec 2025).

1. Mathematical Formulation

The m1-PDSTSP is defined on a complete undirected graph $G = (V, E)$ , with the following elements:

Pickup nodes $P = \{1, \dots, n\}$ , Delivery nodes $D = \{n+1, \dots, 2n\}$ ; each delivery node $i + n$ is paired with pickup $i$ .
Depots: start ($0$) and end ($2n+1$).
Requests $h \in \{1, \dots, n\}$ with demand $q_h \geq 0$ and revenue $r_h \geq 0$ .
Vehicle capacity $Q$ and maximum route-length $T$ .
Travel costs $c_{ij}$ obey the triangle inequality.

Decision variables:

$X_{ij} \in \{0, 1\}$ : 1 if arc $(i, j)$ is used.
$T_i \geq 0$ : cumulative distance upon arrival at node $i$ .
$L_i \geq 0$ : load departing node $i$ .

Objective: Maximize total revenue from served requests: $\max \sum_{h=1}^n r_h \left(\sum_{j \in V} X_{i_h, j}\right).$

Constraints:

No self-loops: $\forall i, X_{ii} = 0$ .
Single departure/arrival: $\sum_{j \in P} X_{0j} = 1$ , $\sum_{i \in D} X_{i,2n+1} = 1$ .
Flow conservation: $\forall i \in P \cup D, \sum_j X_{ij} = \sum_j X_{ji}$ .
At most one visit per node: $\forall i \neq 2n+1, \sum_j X_{ij} \leq 1$ ; $\forall j \neq 0, \sum_i X_{ij} \leq 1$ .
Pairing (Selectivity): $\forall i \in P, \sum_j X_{ij} = \sum_j X_{j,i+n}$ .
Precedence: If $\sum_j X_{ij}=1$ , then $T_{i+n} \geq T_i$ .
Route-length: $T_{0} = 0$ , $T_{2n+1} \leq T$ , and if $X_{ij}=1$ then $T_j \geq T_i + c_{ij}$ .
Capacity: $L_0 = 0$ , $0 \leq L_i \leq Q$ , and if $X_{ij}=1$ ,

$L_j = \begin{cases} L_i + q_j & j \in P,\ L_i - q_{j-n} & j \in D. \end{cases}$

This formulation enforces feasible vehicle tours that select and pair pickup–delivery requests subject to stringent resource, route, and precedence constraints for maximal realized revenue.

2. Hybrid Algorithmic Pipeline: Deep Learning and Metaheuristics

To address the computational and combinatorial complexity, a hybrid pipeline couples a Transformer Neural Network-based constructive policy with a Multi-Start Large Neighborhood Search (MSLNS) metaheuristic executed within a rolling-horizon framework. This pipeline is architected for sub-second inference on market snapshots, a common requirement in online freight exchanges (Zhang et al., 12 Dec 2025).

A. Transformer-Based Constructive Policy

Encoder: Node-wise inputs—geospatial coordinates $y_i$ , normalized demand $\pm \hat{q}_i$ , normalized revenue $\hat{r}_i$ , and depot-type features—are linearly projected and processed by $L$ layers of multi-head self-attention, with batch normalization.
Decoder (Auto-regressive): The decoder operates over the current partial route, vehicle state (remaining capacity, route-length), context vector, and applies masked attention to produce logits for valid next-node selections.
Feasibility Masking: Dynamically excludes visited nodes, deliveries before pickups, infeasible pickups (exceed capacity, route-length bounds).
Training: Uses POMO (Policy Optimization with Multiple Optima), employing multi-start policy-gradient REINFORCE with $M=n/2$ distinct start pickups and shared baseline, Adam optimization, and penalty for exceeding route length. No teacher labels are required.
Inference: Single greedy rollout achieves $O(n^2L)$ complexity, generating feasible solutions in milliseconds for $n$ up to 122.

B. Multi-Start Large Neighborhood Search (MSLNS)

Initialization: $M$ diverse seed routes are generated from Transformer-based POMO rollouts.
Destroy Operator: Tracks request frequency within a beam of size $\beta$ , performing softmax-biased sampling of $k=2,3,4$ requests (progressively growing) for destruction, avoiding repeated selections via memory $\mathcal{U}$ .
Repair Operator: Greedy insertion reconstructs solutions, optimally reintegrating pickup–delivery pairs while maintaining capacity, route-length, and precedence feasibility.
Local Improvement: Applies 2-Opt for further refinement.
Beam Update: Combines prior and new candidates, deduplicates by served-set, and retains the top $\beta$ solutions by revenue.
Termination: Iterates until the time budget $t_{\max}$ is exhausted, returning the route with maximal revenue.

This hybrid approach exploits high-quality, learning-derived seeds to position search close to attraction basins in the solution space, reducing the necessary LNS neighborhood size for optimal improvement.

3. Empirical Performance and Benchmarking

Empirical evaluation spans benchmark problem sizes $n \in \{22, 42, 82, 122\}$ ; each instance features randomized vehicle capacities $Q \sim \mathrm{Unif}[8,20]$ and route-length $T$ scaled to depot-to-depot baselines. Revenue structures encompass Distance, Ton-Distance, Uniform, and Constant regimes.

Baselines:

Heuristic: Greedy Search (GS), Multi-Start Greedy (MSG), Hill-Climbing (HC), 1-destroy Best Improvement LNS (BI-LNS), Adaptive LNS (ALNS), Simulated Annealing (SA).
Neural: Attention Model (AM), POMO (single, multi-start, beam search, SGBS).
Hybrid: AM+HC, AM+BNS, POMO+HC, POMO+BNS, POMO+MSLNS.
Exact: Gurobi (small $n=22$ only).

Metrics:

Average total revenue (higher is better).
Optimality gap $(\text{best\_known} - \text{method\_rev})/\text{best\_known} \times 100\%$ .
Winning rate: Proportion of instances achieving best-known solution.
Runtime: Sub-second for constructor, seconds–minutes for full pipeline.

Key Observations:

POMO greedy alone outperforms classical heuristics on both quality and running time for $n \geq 42$ .
Augmenting AM/POMO with HC or BI-LNS significantly reduces optimality gaps by 5–10%.
POMO+MSLNS $[10,5]$ achieves $<2\%$ gap and $>40\%$ winning rate for all tested $n$ within budget.
Extended POMO+MSLNS $[10,10]$ (unlimited runtime) closes gaps to $<0.6\%$ and $>88\%$ win rate for $n=22$ , comparable to Gurobi.
Under strict sub-second constraints, AM+HC and POMO single-start provide the fastest high-quality solutions.
These performance trends persist across diverse revenue settings, with hybrid methods consistently achieving superior results (Zhang et al., 12 Dec 2025).

4. Rolling-Horizon Market Integration

The rolling-horizon framework accommodates dynamic market operation in online freight exchanges:

The marketplace state is “frozen” at intervals $\Delta t$ to produce static snapshots.
Each snapshot triggers m1-PDSTSP resolution within a sub-second computational budget.
Bundles are dispatched to carriers immediately post-solution.
The POMO+MSLNS pipeline is deployed independently on these rolling snapshots, ensuring low-latency, robust incremental optimization.

A plausible implication is that such a design supports near-continuous reoptimization in high-frequency environments without excessive computational overhead.

5. Generalizations and Broader Applicability

The m1-PDSTSP generalizes a wide range of selective TSP and pickup-and-delivery variants, as summarized in the original variant taxonomy (reference therein). Its structural and methodological solutions are extensible:

The Transformer-based constructive policy and multi-start LNS schema are adaptable to selective routing problems incorporating capacity, precedence, and even time windows.
The key insight is that learned seeds from a deep neural network constructor concentrate search in high-value basins, permitting smaller LNS neighborhoods.
This approach demonstrates, for the first time, that deep neural network-generated solutions reliably provide effective starting points for improvement metaheuristics across selective pickup-and-delivery problems.

Potential extensions include explicit modeling of time windows, multi-vehicle routing, dynamic reoptimization, and distributed LNS techniques for scaling to very large market snapshots (Zhang et al., 12 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Deep Learning--Accelerated Multi-Start Large Neighborhood Search for Real-time Freight Bundling (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Multi-Commodity One-to-One Pickup-and-Delivery Selective Traveling Salesperson Problem (m1-PDSTSP).