Synthetic Routing Instances

Updated 20 November 2025

Synthetic Routing Instances are artificially generated datasets that emulate real-world routing problems, enabling rigorous benchmarking and stress-testing across various domains.
They use spatial sampling, stochastic dynamics, and conditional generative modeling to simulate customer locations, demand patterns, and time-dependent constraints.
Empirical studies show that these instances improve algorithm generalization, reduce optimality gaps, and expose solver limitations under challenging conditions.

Synthetic routing instances are artificially constructed problem instances designed to emulate or stress-test routing algorithms in a variety of domains. Their construction is central to benchmarking, data augmentation, and the development of machine learning and combinatorial optimization methods. Synthetic instances allow researchers to systematically probe solver limits, generalize learning systems beyond real-world idiosyncrasies, and explore new design methodologies for large, heterogeneous, or highly constrained routing environments. These instances now span logistics (VRP), dialogue systems (skill routing), communication networks, and quantum information, underpinning both classical and neural combinatorial optimization research.

1. Core Concepts and Motivations

Synthetic routing instances are generated datasets that encode the structural and statistical properties of routing problems without necessarily corresponding to direct observations from real-world operations. Their primary motivations include:

Scalability Stress-Tests: Enabling controlled experiments at scales (e.g., 10⁵–10⁶ nodes in VRP) or with constraint profiles (e.g., high time-window heterogeneity) not easily available from empirical datasets (Accorsi et al., 2023, Heakl et al., 28 May 2025).
Data Augmentation: Expanding coverage of low-frequency scenarios or rare intent classes in environments with skewed data distributions, such as the long-tail of skill-routing requests in conversational AI (Wu et al., 2023).
Robustness and Generalization: Building solvers that generalize beyond the peculiarities of historical data and maintain performance under distributional shift or uncertainty (Zhu et al., 13 Nov 2025, Heakl et al., 28 May 2025).
Benchmarking and Reproducibility: Providing standardized, transparent construction pipelines to foster fair comparison and reproducible experimentation in the algorithmic community (Accorsi et al., 2023, Heakl et al., 28 May 2025).

2. Generative Methodologies and Formal Recipes

The construction of synthetic routing instances encompasses a broad spectrum, ranging from manual spatial sampling and heuristic demand assignment to advanced generative modeling and evolutionary synthesis. Key methodological paradigms include:

Spatial and Attribute Sampling: For capacitated vehicle routing, uniform or geographically realistic sampling (e.g., from OpenAddresses) defines customer locations, with planar projections ensuring consistent distance metric properties. Demand is often independently sampled from uniform or empirical distributions (Accorsi et al., 2023).
Stochastic Dynamics and Constraint Enrichment: Stochastic VRP benchmarks introduce log-normal travel delays, time-dependent congestion, Poisson-distributed accident events, and empirically driven delivery time-windows, resulting in rich, temporally nonstationary instance distributions (Heakl et al., 28 May 2025).
Conditional Generative Modeling: In dialogue skill-routing, conditional encoder-decoder frameworks enable field-wise perturbation and synthesis. These include conditional variational autoencoders (pcVAE), conditional BERT with masked LM (CV-BERT MLM), and sequence-to-sequence models such as Joint T5—each trained to generate plausible instances conditioned on rare intent or device configurations (Wu et al., 2023).
LLM-Guided Evolutionary Synthesis: Recent advances leverage evolutionary algorithms, guided by LLMs, to iteratively mutate and select generator programs that output synthetic instances whose structural attributes match those of real benchmarks. Fitness criteria are defined by Solver-perceived optimality gaps on validation sets (Zhu et al., 13 Nov 2025).

3. Structural Realism and Statistical Characterization

Capturing real-world complexity necessitates a principled assessment of spatial and attribute distributions:

Metric	Definition/Role	Example Use
FFT Energy	Mean spectral energy in 2D histogram’s nonzero Fourier modes; measures global regularity or geometric motifs	Segmenting instance types (Zhu et al., 13 Nov 2025)
NN-Ratio	Coefficient of variation of nearest-neighbor distances; quantifies local clustering and spacing regularity	Identifying clustered layouts (Zhu et al., 13 Nov 2025)
Intrinsic Metrics	Perplexity, unique rate, Dist-1, entropy—measures of text generation quality and diversity in skill-routing	Assessing synthetic utterance augmentation (Wu et al., 2023)

By plotting and thresholding these metrics (e.g., $E_{\mathrm{FFT}}=35$ , $R_{\mathrm{NN}}=0.5$ ), one can cluster real instances into structural classes (e.g., repetitive motifs, global regularity, high local heterogeneity), enabling targeted generator evolution (Zhu et al., 13 Nov 2025).

4. Practical Algorithms and Representative Pipelines

Concrete synthesis pipelines are well-documented across domains:

Skill-routing Instance Generation: For dialogue systems, the augmentation algorithm processes each tail intent by encoding input context (intent, device type/status, utterance), masking relevant fields, and decoding new utterances and feature values with top- $k$ or nucleus sampling. The resulting synthetic hypotheses are appended to the training set, with retraining substantially boosting tail-intent replication accuracy (with 5 $\times$ augmentation: 80% of intents under 10K samples improved) (Wu et al., 2023).
Stochastic VRP Benchmarks: SVRPBench generates geographies via k-means-clustered city layouts, samples customer and depot locations, assigns demands and time-windows from bimodal Gaussians or uniforms, and computes time-dependent, stochastic travel times via parameterized log-normal models and accident processes. All parameters and seeds are logged for reproducibility (Heakl et al., 28 May 2025).
XXL VRP Instances: The Italian-regions benchmarks sample up to 1M customers from OpenAddresses, projecting to planar coordinates, uniformly assigning demands and structuring depot placement and vehicle capacities to span a diversity of route-length regimes (Accorsi et al., 2023).

5. Empirical Impact and Benchmarking Outcomes

Empirical evaluation demonstrates the utility and current limitations of synthetic instance construction:

In dialogue systems, conditional generation models (Joint T5 with masked contrastive and frequency-aware losses) achieve intrinsic perplexity 2.307, high diversity (unique rate 0.652), and significant extrinsic gains: up to +40–60% of tail intents crossing accuracy thresholds after 5 $\times$ augmentation. Oversampling frequent intents without generative variation is markedly less effective (Wu et al., 2023).
For VRP, the leap to 1M-customer synthetic instances pushes algorithmic techniques—such as FILO2's pruning and acceleration—beyond prior benchmark sizes by up to two orders of magnitude, stimulating advances in scalability and locality-based methods (Accorsi et al., 2023).
LLM-guided generator evolution (EvoReal) closes solver generalization gaps between synthetic training and real benchmarks: On TSPLib, the optimized neural model achieves a 1.05% optimality gap (vanilla: 2.48%, baseline: >12%), and on CVRPLib reduces the gap to 2.71% (baseline: 5.90%), with ablation showing the necessity of two-stage structure-aligned adaptivity (Zhu et al., 13 Nov 2025).
In stochastic VRP, RL solvers optimized on static synthetic data degrade by over 20% when deployed on high-stochasticity SVRPBench instances, whereas classically designed methods retain higher robustness—highlighting the value of synthetic instances in diagnosing distributional shifts (Heakl et al., 28 May 2025).

6. Limitations, Failure Modes, and Best Practices

While synthetic routing instances are indispensable experimental tools, several well-documented limitations persist:

Modeling Fidelity: Synthetic demand, time, and spatial distributions may not fully capture the correlations and heterogeneity present in real-world systems (e.g., uniform demand in [1,2,3] ignoring actual parcel sizes) (Accorsi et al., 2023, Heakl et al., 28 May 2025).
Semantic Drift: Conditional generative approaches in dialogue systems sometimes yield ungrammatical or semantically incoherent outputs, motivating the inclusion of regularizers or field-specific decoders when fields become highly heterogeneous (Wu et al., 2023).
Projection Artefacts: Simple projection schemes (equirectangular, UTM) can introduce distortions, though these are negligible for most routing performance metrics (Accorsi et al., 2023).
Overfitting and Underfitting: Direct fine-tuning on real instances without structure-enriched synthetic alignment leads to higher optimality gaps, while naive uniform synthetic training underperforms on complex benchmarks (Zhu et al., 13 Nov 2025).

Established best practices include:

Employing interpretable structural metrics for both generator tuning and downstream evaluation (Zhu et al., 13 Nov 2025).
Modular generator design with parameterized, code-driven pipelines (e.g., Python, Dockerized environments) (Heakl et al., 28 May 2025).
Two-stage curricula for neural solvers: synthetic-enriched alignment followed by benchmark fine-tuning (Zhu et al., 13 Nov 2025).
Transparent documentation of sampling, seeding, and projection details for reproduction and extension (Accorsi et al., 2023, Heakl et al., 28 May 2025).

A plausible implication is that the design of next-generation neural and combinatorial routing systems will increasingly depend on both the quality and realism of synthetic instance generators, necessitating continued integration of statistical, algorithmic, and domain-specific knowledge in their construction.