Papers
Topics
Authors
Recent
Search
2000 character limit reached

Distributional MIPLIB Library

Updated 8 February 2026
  • Distributional MIPLIB is a public library of MILP problem distributions that standardizes benchmarks for ML-guided optimization research.
  • It systematically categorizes 13 domains of MILP instances, both synthetic and real-world, with defined computational hardness levels.
  • It provides accessible APIs, CLI tools, and reproducible data splits to support fair evaluation of ML-enhanced MILP algorithms.

Distributional MIPLIB is a publicly available, multi-domain library of problem distributions designed to advance ML-guided methods in Mixed Integer Linear Programming (MILP). The resource comprises curated synthetic and real-world MILP distributions, systematically categorized by problem domain and computational hardness. Distributional MIPLIB standardizes research in ML for combinatorial optimization by offering reproducible, comprehensive benchmarks covering diverse domains and hardness levels, supporting both the evaluation and development of ML-guided MILP algorithms (Huang et al., 2024).

1. Domains and Distribution Structure

Distributional MIPLIB includes 13 MILP domains, partitioned into synthetic and real-world origins. Each domain encompasses multiple distributions, with instances constructed to span a predetermined range of solution hardness, facilitating fair and rigorous benchmarking.

Synthetic Domains (with provided Python generators):

  • Combinatorial Auction (CA): Winner determination auctions (Leyton-Brown et al., 2000).
  • Set Covering (SC): Subset selection covering random element fractions (Balas & Ho, 1980).
  • Maximum Independent Set (MIS): Erdős–Rényi graph instances (Bergman et al., 2016).
  • Minimum Vertex Cover (MVC): MIS complement on random graphs (Dinur & Safra, 2005).
  • Generalized Independent Set Problem (GISP): Spatial forest-harvesting graphs (Hochbaum & Pathria, 1997; Colombi et al., 2017).
  • Capacitated Facility Location Problem (CFLP): Plant location with capacities and fixed charges (Cornuéjols et al., 1991).
  • Load Balancing (LB): Job-to-machine assignment under load constraints (Wilson, 1992).
  • Item Placement (IP): Item-to-container packing for minimal imbalance (Martello & Toth, 1990).

Real-World Domains:

  • Maritime Inventory Routing (MIRP): Ship routing and port inventory (Papageorgiou et al., 2014).
  • Neural Network Verification (NNV): MILP for CNN robustness on MNIST (Cheng et al., 2017; Tjeng et al., 2018).
  • Optimal Transmission Switching (OTS): Grid topology under wildfire risk (Pollack et al., 2024).
  • Middle-Mile Consolidation Network (MMCN): E-commerce network design, binary+integer and binary+continuous variants (Greening et al., 2023).
  • Seismic-Resilient Pipe Network (SRPN): Water network for seismic resilience (Huang & Dilkina, 2020).

Synthetic domains denoted by (†) offer Python generators for scalable instance synthesis.

2. Hardness Levels: Computational Classification

Distributional MIPLIB assigns five hardness levels to each distribution based on Gurobi v10.0.3 statistics for a one-hour time limit. For a test set DD of D|D| instances, let tit_i denote the solution time (if solved within 1 hour) and gapi=zPizDi/zPigap_i = |z_{P_i} - z_{D_i}| / |z_{P_i}| capture the primal-dual gap on timeout, with zPz_P as the best primal and zDz_D as the best dual bound. The levels are:

Hardness Solve Criterion Mean Solve Time / Gap
Easy All solved in 1h 1Diti<100\frac{1}{|D|} \sum_{i} t_i < 100 s
Medium All solved in 1h $100$ s 1Diti<1000\le \frac{1}{|D|} \sum_{i} t_i < 1000 s
Hard All solved in 1h 1Diti1000\frac{1}{|D|} \sum_{i} t_i \ge 1000 s
Very Hard None solved in 1h D|D|0
Extremely Hard None solved in 1h D|D|1

This enables systematic grading of MILP difficulty and targeted evaluation by complexity.

3. Data Content and Formats

Each instance is encoded in standard MPS format (*.mps), accompanied by a JSON metadata file with:

  • instance_id
  • domain (e.g., “MIS”)
  • hardness (easy, medium, etc.)
  • split (“train” or “test”)
  • n_var_binary, n_var_integer, n_var_continuous
  • n_constraints
  • solve statistics (solve_time, gap, primal_dual_integral)

Distributions are organized as:

D|D|5

Generators for synthetic domains, located under /generators/, allow instantiation via domain-specific parameters. For example, the MIS (Maximum Independent Set) Erdős–Rényi generator takes node count D|D|2, edge probability D|D|3, and a random seed.

4. Access Modalities and API

Distributional MIPLIB is accessible via a pip-installable Python package (distributional_miplib) and a Command-Line Interface (CLI):

Python API Example:

D|D|6

CLI Example:

D|D|7

Train/test splits are fixed (900/100 for synthetic; adapted protocols for MIRP, SRPN, and NNV), ensuring consistent evaluation.

5. Empirical Evaluation Paradigms

The library has been empirically validated by re-running Learn2Branch (a GCN-based imitation-learning branching method) using SCIP 6.0.1 (1 hour or 800 s limit). Metrics collected per distribution include:

  • D|D|4Opt: Number of instances solved to optimality
  • Opt Time: Mean solve time for solved instances
  • NonOpt Gap: Mean primal-dual gap for unsolved instances
  • Integral: Primal-dual integral (Berthold, 2013)
  • Node-based metrics: Node count, integral vs. nodes, ML inference percentage

Experiments demonstrate:

  • ML-guided branching performance on new—previously unused—distributions (e.g., GISP, OTS, SRPN).
  • Mixed-distribution policy training (“ML-mix5”) yields superior performance with limited training data, compared to single-domain training, and generalizes to larger instances.
  • Transfer experiments confirm generalization of mixed models to harder distributions in MIS and SC.

Experimental details and split protocols ensure replicability. All empirical findings are specific to the used solvers and configurations.

6. Usage Guidelines and Current Limitations

Best Practices:

  • Start with “easy” distributions for model debugging and validation, escalate to higher hardness for advanced research.
  • Mixed-domain training is recommended in low-data regimes; single-domain is preferable with abundant data.
  • Large-instance inference time (InferPct) may dominate total solve time. Lightweight GNNs or batch inference are suggested.
  • Standardized test splits facilitate reproducible comparisons.

Known Limitations:

  • Some real-world distributions (MIRP, SRPN) offer fewer than 100 test instances due to data scarcity.
  • MILP files are only provided in MPS format; no LP or proprietary solver binaries.
  • Metadata is limited to basic instance-level descriptors; feature engineering and richer annotations are left to the user.
  • No direct API for solver wrappers (e.g., Ecole); users must load MPS files separately.
  • Observed empirical results are specific to SCIP 6.0.1; outcomes may differ with alternative solvers or hardware.

7. Significance and Research Applications

Distributional MIPLIB standardizes evaluation of ML-guided MILP methods across diverse tasks and domains, bridging gaps left by domain-specific benchmarks. Researchers benefit from:

  • Comprehensive domain coverage—ranging from classical combinatorial to new application-motivated MILPs.
  • Explicit, reproducible instance hardness stratification for fairer algorithmic comparison.
  • Extensible infrastructure for problem generation and evaluation, underpinning robust, cross-domain benchmarking in ML-augmented combinatorial optimization (Huang et al., 2024).

A plausible implication is an acceleration of the comparative analysis and development of ML-driven heuristics, branching policies, and hybrid solvers within the optimization community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distributional MIPLIB Library.