EvoReal: Evolutionary Realistic Instance Synthesis
- EvoReal is a unifying paradigm that applies evolutionary algorithms to generate realistic and diverse problem instances across domains like optimization and generative modeling.
- It employs reversible encoding, multi-objective fitness functions, and feature-space gap-filling to overcome limitations of fixed procedural instance generation methods.
- EvoReal enhances practical applications by improving neural solver generalization and instance corpus construction, achieving reduced optimality gaps and increased instance diversity.
Evolutionary Realistic Instance Synthesis (EvoReal) constitutes a unifying paradigm for producing realistic, diverse problem instances—across domains such as combinatorial optimization and conditional generative modeling—by employing evolutionary algorithms (EAs) that operate either directly over problem structures or in the latent/control space of a generative model. EvoReal systematically explores the instance space, optimizing for properties such as realism, diversity, performance-relevance, or statistical alignment with empirical data, and has demonstrated efficacy in both neural solver generalization and challenging instance corpus construction (Liu et al., 2021, Lechien et al., 2020, Zhu et al., 13 Nov 2025).
1. Formal Definition and Core Principles
EvoReal is formally characterized by two principal objectives:
- Realism: Synthesized instances are constrained to lie near or within the empirical feature manifold of real-world data (e.g., structural statistics of real routing problems, facial action units encoding real expressions, or graph-theoretic features observed in operational networks).
- Diversity and Coverage: The instance set collectively spans a broad volume of the underlying feature space, specifically targeting poorly explored or challenging regions.
These objectives are achieved by defining a reversible encoding (bitstring, real-valued vector, programmatic data-generator, or conditional variable vector), a multi-objective or task-dependent fitness function, and an evolutionary process that modulates population diversity, selection, and explicit feature-space coverage (Lechien et al., 2020). Unlike traditional synthetic benchmarks, EvoReal adapts or optimizes instance characteristics by explicit search rather than by fixed procedural templates.
2. Methodological Implementations
2.1 Graph/Combinatorial Domain
For the Hamiltonian completion problem (Lechien et al., 2020), EvoReal encodes undirected graphs as binary chromosomes using the upper-triangular adjacency matrix. The evolutionary loop maintains populations (µ=20, λ=30), applying tournament selection, two-point crossover (probability 1/3), per-bit mutation (probability 0.03), and direct cloning (probability 1/3). Fitness in the extremization phase maximizes or minimizes the difference in solver runtimes between Concorde and MSLS, while the gap-filling phase explicitly targets sparsely populated regions of a 10-dimensional structural feature space, projected by PCA. Subsequent analysis reveals distinct clusters of instance "difficulty" unreachable by textbook generators.
2.2 Generative Modeling Domain
EvoGAN (Liu et al., 2021) instantiates EvoReal over a frozen, high-capacity conditional GAN for facial images. Here, the evolutionary encoding is a real-valued vector of Facial Action Unit (AU) intensities ( for ), with each individual in the EA corresponding to a target AU configuration for image synthesis. Fitness is computed as the (or ) distance between the output of a pre-trained facial expression recognizer (FER) applied to the synthetic face and the desired expression vector . The evolutionary process (population , generations ) leverages specialized crossover, intensity-mutation, and adaptive probabilities, yielding image sets with diverse, accurate, and photorealistic expressions, including compound and rare compositions that gradient-based methods cannot access.
2.3 Neural Combinatorial Optimization & Data-Generator Evolution
In neural combinatorial optimization (NCO) (Zhu et al., 13 Nov 2025), EvoReal employs LLM-guided evolution to search the space of Python data-generator programs, aiming to produce VRP/TSP/CVRP instances whose global (FFT energy) and local (nearest-neighbor ratio) spatial point set features closely match those found in benchmark corpora (TSPLib/CVRPLib). Each generator is treated as an individual; evolutionary reflection and mutation are mediated by LLM prompts and code synthesis. The fitness measure proxies generator quality by the average optimality gap on a validation subset after short fine-tuning of a solver model, directly targeting downstream performance.
3. Fitness Functions and Evaluation Metrics
Across instantiations, EvoReal employs domain-aligned fitness functions:
| Domain | Instance Encoding | Fitness Function(s) |
|---|---|---|
| HCP (Graph) | Bitstring (adjacency) | Solver runtime difference, feature-space novelty, gap minimization |
| EvoGAN (Faces) | AU intensity vector | distance between FER output and target expression |
| Routing (NCO) | Data-generator function | Validation optimality gap after solver fine-tuning |
Metrics reported include convergence rates, final fitness values, diversity indicators (e.g., standard deviation of AU vectors, LPIPS for images, pairwise feature-space distances), and success rate (fraction of runs achieving target proximity).
4. Progressive Adaptation and Integration with Downstream Solvers
In NCO, EvoReal is positioned as the foundation of a progressive training pipeline. After evolving data-generators that mimic real-structure instance distributions, NCO models are first aligned to this synthetic—yet realistic—distribution (Phase I), then fine-tuned on the actual real-world problems (Phase II) (Zhu et al., 13 Nov 2025). This two-stage alignment is critical: omitting either stage (see ablation in [(Zhu et al., 13 Nov 2025), Tables \ref{tab:ablate_pomo}, \ref{tab:ablate_lehd}]) notably increases the optimality gap, particularly for large instances. EvoReal’s integration is modular: the EA drives the generator population, while the neural solver is trained on the resultant instance pool, promoting generalization and stability.
5. Feature Space Characterization and Gap-Filling
A core element of EvoReal is instance-space analysis and gap-filling. In (Lechien et al., 2020), a logarithmic transform followed by z-score normalization and PCA reveals that canonical benchmarks only explore extremes ("rims") of the instance landscape, whereas EvoReal systematically fills central and difficult regions. Graph features explicitly measured include density, clustering coefficient, graph energy, degree statistics (max, std, skew, kurtosis), diameter, and degree-frequency fractions.
In the VRP domain (Zhu et al., 13 Nov 2025), FFT energy and nearest-neighbor ratio summarize periodicity, clustering, and regularity—features that are then directly targeted during LLM-driven generator evolution.
6. Computational Considerations and Scalability
EvoReal’s cost is defined primarily by the number of EA generations, population size, and the expense of downstream fitness evaluation (e.g., solver inference, discriminator evaluation). In EvoGAN (Liu et al., 2021) with , , the process completes in approximately $10$ seconds per expression on GPU, with increased cost (tens of minutes) on CPU. In HCP synthesis (Lechien et al., 2020), $160$ CPU-hours produced a diverse set of $24,682$ challenging instances.
Large-scale NCO adaptation (Zhu et al., 13 Nov 2025) requires repeated model fine-tuning for proxy fitness evaluation but demonstrates significant downstream improvements: e.g., reducing the optimality gap to on TSPLib and on CVRPLib.
7. Generalizations and Applicability Across Domains
EvoReal is domain-agnostic in its conceptual structure:
- Combinatorial optimization: Any reversible encoding and performance-driven fitness is supported. Feature-space gap-filling is generally applicable, provided appropriate feature extractors.
- Conditional generative modeling: Any high-capacity, pre-trained generator (e.g., StyleGAN, 3D morphable models) can serve as the backbone, with EAs exploring the appropriate latent or conditioning variables.
- Programmatic data synthesis: Evolution over generator code spaces (potentially LLM-accelerated) supports domains where instance structure is too complex or compositional for fixed encoding.
A plausible implication is that EvoReal establishes a procedural template for constructing robust, diverse, and realistic evaluation suites, and serves as an essential layer in adapting Neural/Deep models beyond naive synthetic data regimes.
References
- EvoGAN: An Evolutionary Computation Assisted GAN (Liu et al., 2021)
- Evolving test instances of the Hamiltonian completion problem (Lechien et al., 2020)
- Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation (Zhu et al., 13 Nov 2025)