Distributional MIPLIB Library

Updated 8 February 2026

Distributional MIPLIB is a public library of MILP problem distributions that standardizes benchmarks for ML-guided optimization research.
It systematically categorizes 13 domains of MILP instances, both synthetic and real-world, with defined computational hardness levels.
It provides accessible APIs, CLI tools, and reproducible data splits to support fair evaluation of ML-enhanced MILP algorithms.

Distributional MIPLIB is a publicly available, multi-domain library of problem distributions designed to advance ML-guided methods in Mixed Integer Linear Programming (MILP). The resource comprises curated synthetic and real-world MILP distributions, systematically categorized by problem domain and computational hardness. Distributional MIPLIB standardizes research in ML for combinatorial optimization by offering reproducible, comprehensive benchmarks covering diverse domains and hardness levels, supporting both the evaluation and development of ML-guided MILP algorithms (Huang et al., 2024).

1. Domains and Distribution Structure

Distributional MIPLIB includes 13 MILP domains, partitioned into synthetic and real-world origins. Each domain encompasses multiple distributions, with instances constructed to span a predetermined range of solution hardness, facilitating fair and rigorous benchmarking.

Synthetic Domains (with provided Python generators):

Combinatorial Auction (CA): Winner determination auctions (Leyton-Brown et al., 2000).
Set Covering (SC): Subset selection covering random element fractions (Balas & Ho, 1980).
Maximum Independent Set (MIS): Erdős–Rényi graph instances (Bergman et al., 2016).
Minimum Vertex Cover (MVC): MIS complement on random graphs (Dinur & Safra, 2005).
Generalized Independent Set Problem (GISP): Spatial forest-harvesting graphs (Hochbaum & Pathria, 1997; Colombi et al., 2017).
Capacitated Facility Location Problem (CFLP): Plant location with capacities and fixed charges (Cornuéjols et al., 1991).
Load Balancing (LB): Job-to-machine assignment under load constraints (Wilson, 1992).
Item Placement (IP): Item-to-container packing for minimal imbalance (Martello & Toth, 1990).

Real-World Domains:

Maritime Inventory Routing (MIRP): Ship routing and port inventory (Papageorgiou et al., 2014).
Neural Network Verification (NNV): MILP for CNN robustness on MNIST (Cheng et al., 2017; Tjeng et al., 2018).
Optimal Transmission Switching (OTS): Grid topology under wildfire risk (Pollack et al., 2024).
Middle-Mile Consolidation Network (MMCN): E-commerce network design, binary+integer and binary+continuous variants (Greening et al., 2023).
Seismic-Resilient Pipe Network (SRPN): Water network for seismic resilience (Huang & Dilkina, 2020).

Synthetic domains denoted by (†) offer Python generators for scalable instance synthesis.

2. Hardness Levels: Computational Classification

Distributional MIPLIB assigns five hardness levels to each distribution based on Gurobi v10.0.3 statistics for a one-hour time limit. For a test set $D$ of $|D|$ instances, let $t_i$ denote the solution time (if solved within 1 hour) and $gap_i = |z_{P_i} - z_{D_i}| / |z_{P_i}|$ capture the primal-dual gap on timeout, with $z_P$ as the best primal and $z_D$ as the best dual bound. The levels are:

Hardness	Solve Criterion	Mean Solve Time / Gap
Easy	All solved in 1h	$\frac{1}{\|D\|} \sum_{i} t_i < 100$ s
Medium	All solved in 1h	$100$ s $\le \frac{1}{\|D\|} \sum_{i} t_i < 1000$ s
Hard	All solved in 1h	$\frac{1}{\|D\|} \sum_{i} t_i \ge 1000$ s
Very Hard	None solved in 1h	$\|D\|$ 0
Extremely Hard	None solved in 1h	$\|D\|$ 1

This enables systematic grading of MILP difficulty and targeted evaluation by complexity.

3. Data Content and Formats

Each instance is encoded in standard MPS format (*.mps), accompanied by a JSON metadata file with:

instance_id
domain (e.g., “MIS”)
hardness (easy, medium, etc.)
split (“train” or “test”)
n_var_binary, n_var_integer, n_var_continuous
n_constraints
solve statistics (solve_time, gap, primal_dual_integral)

Distributions are organized as:

$|D|$ 5

Generators for synthetic domains, located under /generators/, allow instantiation via domain-specific parameters. For example, the MIS (Maximum Independent Set) Erdős–Rényi generator takes node count $|D|$ 2, edge probability $|D|$ 3, and a random seed.

4. Access Modalities and API

Distributional MIPLIB is accessible via a pip-installable Python package (distributional_miplib) and a Command-Line Interface (CLI):

Python API Example:

$|D|$ 6

CLI Example:

$|D|$ 7

Train/test splits are fixed (900/100 for synthetic; adapted protocols for MIRP, SRPN, and NNV), ensuring consistent evaluation.

5. Empirical Evaluation Paradigms

The library has been empirically validated by re-running Learn2Branch (a GCN-based imitation-learning branching method) using SCIP 6.0.1 (1 hour or 800 s limit). Metrics collected per distribution include:

$|D|$ 4Opt: Number of instances solved to optimality
Opt Time: Mean solve time for solved instances
NonOpt Gap: Mean primal-dual gap for unsolved instances
Integral: Primal-dual integral (Berthold, 2013)
Node-based metrics: Node count, integral vs. nodes, ML inference percentage

Experiments demonstrate:

ML-guided branching performance on new—previously unused—distributions (e.g., GISP, OTS, SRPN).
Mixed-distribution policy training (“ML-mix5”) yields superior performance with limited training data, compared to single-domain training, and generalizes to larger instances.
Transfer experiments confirm generalization of mixed models to harder distributions in MIS and SC.

Experimental details and split protocols ensure replicability. All empirical findings are specific to the used solvers and configurations.

6. Usage Guidelines and Current Limitations

Best Practices:

Start with “easy” distributions for model debugging and validation, escalate to higher hardness for advanced research.
Mixed-domain training is recommended in low-data regimes; single-domain is preferable with abundant data.
Large-instance inference time (InferPct) may dominate total solve time. Lightweight GNNs or batch inference are suggested.
Standardized test splits facilitate reproducible comparisons.

Known Limitations:

Some real-world distributions (MIRP, SRPN) offer fewer than 100 test instances due to data scarcity.
MILP files are only provided in MPS format; no LP or proprietary solver binaries.
Metadata is limited to basic instance-level descriptors; feature engineering and richer annotations are left to the user.
No direct API for solver wrappers (e.g., Ecole); users must load MPS files separately.
Observed empirical results are specific to SCIP 6.0.1; outcomes may differ with alternative solvers or hardware.

7. Significance and Research Applications

Distributional MIPLIB standardizes evaluation of ML-guided MILP methods across diverse tasks and domains, bridging gaps left by domain-specific benchmarks. Researchers benefit from:

Comprehensive domain coverage—ranging from classical combinatorial to new application-motivated MILPs.
Explicit, reproducible instance hardness stratification for fairer algorithmic comparison.
Extensible infrastructure for problem generation and evaluation, underpinning robust, cross-domain benchmarking in ML-augmented combinatorial optimization (Huang et al., 2024).

A plausible implication is an acceleration of the comparative analysis and development of ML-driven heuristics, branching policies, and hybrid solvers within the optimization community.

Markdown Report Issue Upgrade to Chat

References (1)

Distributional MIPLIB: a Multi-Domain Library for Advancing ML-Guided MILP Methods (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distributional MIPLIB Library.

Distributional MIPLIB Library

1. Domains and Distribution Structure

2. Hardness Levels: Computational Classification

3. Data Content and Formats

4. Access Modalities and API

5. Empirical Evaluation Paradigms

6. Usage Guidelines and Current Limitations

7. Significance and Research Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Distributional MIPLIB Library

1. Domains and Distribution Structure

2. Hardness Levels: Computational Classification

3. Data Content and Formats

4. Access Modalities and API

5. Empirical Evaluation Paradigms

6. Usage Guidelines and Current Limitations

7. Significance and Research Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research