Distributional MIPLIB Library
- Distributional MIPLIB is a public library of MILP problem distributions that standardizes benchmarks for ML-guided optimization research.
- It systematically categorizes 13 domains of MILP instances, both synthetic and real-world, with defined computational hardness levels.
- It provides accessible APIs, CLI tools, and reproducible data splits to support fair evaluation of ML-enhanced MILP algorithms.
Distributional MIPLIB is a publicly available, multi-domain library of problem distributions designed to advance ML-guided methods in Mixed Integer Linear Programming (MILP). The resource comprises curated synthetic and real-world MILP distributions, systematically categorized by problem domain and computational hardness. Distributional MIPLIB standardizes research in ML for combinatorial optimization by offering reproducible, comprehensive benchmarks covering diverse domains and hardness levels, supporting both the evaluation and development of ML-guided MILP algorithms (Huang et al., 2024).
1. Domains and Distribution Structure
Distributional MIPLIB includes 13 MILP domains, partitioned into synthetic and real-world origins. Each domain encompasses multiple distributions, with instances constructed to span a predetermined range of solution hardness, facilitating fair and rigorous benchmarking.
Synthetic Domains (with provided Python generators):
- Combinatorial Auction (CA): Winner determination auctions (Leyton-Brown et al., 2000).
- Set Covering (SC): Subset selection covering random element fractions (Balas & Ho, 1980).
- Maximum Independent Set (MIS): Erdős–Rényi graph instances (Bergman et al., 2016).
- Minimum Vertex Cover (MVC): MIS complement on random graphs (Dinur & Safra, 2005).
- Generalized Independent Set Problem (GISP): Spatial forest-harvesting graphs (Hochbaum & Pathria, 1997; Colombi et al., 2017).
- Capacitated Facility Location Problem (CFLP): Plant location with capacities and fixed charges (Cornuéjols et al., 1991).
- Load Balancing (LB): Job-to-machine assignment under load constraints (Wilson, 1992).
- Item Placement (IP): Item-to-container packing for minimal imbalance (Martello & Toth, 1990).
Real-World Domains:
- Maritime Inventory Routing (MIRP): Ship routing and port inventory (Papageorgiou et al., 2014).
- Neural Network Verification (NNV): MILP for CNN robustness on MNIST (Cheng et al., 2017; Tjeng et al., 2018).
- Optimal Transmission Switching (OTS): Grid topology under wildfire risk (Pollack et al., 2024).
- Middle-Mile Consolidation Network (MMCN): E-commerce network design, binary+integer and binary+continuous variants (Greening et al., 2023).
- Seismic-Resilient Pipe Network (SRPN): Water network for seismic resilience (Huang & Dilkina, 2020).
Synthetic domains denoted by (†) offer Python generators for scalable instance synthesis.
2. Hardness Levels: Computational Classification
Distributional MIPLIB assigns five hardness levels to each distribution based on Gurobi v10.0.3 statistics for a one-hour time limit. For a test set of instances, let denote the solution time (if solved within 1 hour) and capture the primal-dual gap on timeout, with as the best primal and as the best dual bound. The levels are:
| Hardness | Solve Criterion | Mean Solve Time / Gap |
|---|---|---|
| Easy | All solved in 1h | s |
| Medium | All solved in 1h | $100$ s s |
| Hard | All solved in 1h | s |
| Very Hard | None solved in 1h | 0 |
| Extremely Hard | None solved in 1h | 1 |
This enables systematic grading of MILP difficulty and targeted evaluation by complexity.
3. Data Content and Formats
Each instance is encoded in standard MPS format (*.mps), accompanied by a JSON metadata file with:
- instance_id
- domain (e.g., “MIS”)
- hardness (easy, medium, etc.)
- split (“train” or “test”)
- n_var_binary, n_var_integer, n_var_continuous
- n_constraints
- solve statistics (solve_time, gap, primal_dual_integral)
Distributions are organized as:
5
Generators for synthetic domains, located under /generators/, allow instantiation via domain-specific parameters. For example, the MIS (Maximum Independent Set) Erdős–Rényi generator takes node count 2, edge probability 3, and a random seed.
4. Access Modalities and API
Distributional MIPLIB is accessible via a pip-installable Python package (distributional_miplib) and a Command-Line Interface (CLI):
Python API Example:
6
CLI Example:
7
Train/test splits are fixed (900/100 for synthetic; adapted protocols for MIRP, SRPN, and NNV), ensuring consistent evaluation.
5. Empirical Evaluation Paradigms
The library has been empirically validated by re-running Learn2Branch (a GCN-based imitation-learning branching method) using SCIP 6.0.1 (1 hour or 800 s limit). Metrics collected per distribution include:
- 4Opt: Number of instances solved to optimality
- Opt Time: Mean solve time for solved instances
- NonOpt Gap: Mean primal-dual gap for unsolved instances
- Integral: Primal-dual integral (Berthold, 2013)
- Node-based metrics: Node count, integral vs. nodes, ML inference percentage
Experiments demonstrate:
- ML-guided branching performance on new—previously unused—distributions (e.g., GISP, OTS, SRPN).
- Mixed-distribution policy training (“ML-mix5”) yields superior performance with limited training data, compared to single-domain training, and generalizes to larger instances.
- Transfer experiments confirm generalization of mixed models to harder distributions in MIS and SC.
Experimental details and split protocols ensure replicability. All empirical findings are specific to the used solvers and configurations.
6. Usage Guidelines and Current Limitations
Best Practices:
- Start with “easy” distributions for model debugging and validation, escalate to higher hardness for advanced research.
- Mixed-domain training is recommended in low-data regimes; single-domain is preferable with abundant data.
- Large-instance inference time (InferPct) may dominate total solve time. Lightweight GNNs or batch inference are suggested.
- Standardized test splits facilitate reproducible comparisons.
Known Limitations:
- Some real-world distributions (MIRP, SRPN) offer fewer than 100 test instances due to data scarcity.
- MILP files are only provided in MPS format; no LP or proprietary solver binaries.
- Metadata is limited to basic instance-level descriptors; feature engineering and richer annotations are left to the user.
- No direct API for solver wrappers (e.g., Ecole); users must load MPS files separately.
- Observed empirical results are specific to SCIP 6.0.1; outcomes may differ with alternative solvers or hardware.
7. Significance and Research Applications
Distributional MIPLIB standardizes evaluation of ML-guided MILP methods across diverse tasks and domains, bridging gaps left by domain-specific benchmarks. Researchers benefit from:
- Comprehensive domain coverage—ranging from classical combinatorial to new application-motivated MILPs.
- Explicit, reproducible instance hardness stratification for fairer algorithmic comparison.
- Extensible infrastructure for problem generation and evaluation, underpinning robust, cross-domain benchmarking in ML-augmented combinatorial optimization (Huang et al., 2024).
A plausible implication is an acceleration of the comparative analysis and development of ML-driven heuristics, branching policies, and hybrid solvers within the optimization community.