Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Strong mixed-integer programming formulations for trained neural networks (1811.08359v2)

Published 20 Nov 2018 in math.OC and cs.LG

Abstract: We present an ideal mixed-integer programming (MIP) formulation for a rectified linear unit (ReLU) appearing in a trained neural network. Our formulation requires a single binary variable and no additional continuous variables beyond the input and output variables of the ReLU. We contrast it with an ideal "extended" formulation with a linear number of additional continuous variables, derived through standard techniques. An apparent drawback of our formulation is that it requires an exponential number of inequality constraints, but we provide a routine to separate the inequalities in linear time. We also prove that these exponentially-many constraints are facet-defining under mild conditions. Finally, we study network verification problems and observe that dynamically separating from the exponential inequalities 1) is much more computationally efficient and scalable than the extended formulation, 2) decreases the solve time of a state-of-the-art MIP solver by a factor of 7 on smaller instances, and 3) nearly matches the dual bounds of a state-of-the-art MIP solver on harder instances, after just a few rounds of separation and in orders of magnitude less time.

Citations (208)

Summary

  • The paper presents an ideal formulation for a single ReLU neuron using a single binary variable and an efficient separation routine for exponential inequalities.
  • It demonstrates that the non-extended MIP formulation improves performance by seven times on small neural network verification tasks on the MNIST dataset.
  • The study highlights the potential for scaling neural network verification by reducing computational overhead and enhancing dual bounds with novel MIP techniques.

Strong Mixed-Integer Programming Formulations for Trained Neural Networks

The research paper presents novel mixed-integer programming (MIP) formulations specifically designed for rectified linear unit (ReLU) neurons within trained neural networks. The central contribution of this work is the development of an ideal formulation for a single ReLU neuron that is not only compact in its representation—requiring only a single binary variable—but also effectively manages the associated computational demands, which traditionally include an exponential number of inequality constraints.

Main Contributions

The paper contrasts two MIP formulations:

  1. An ideal non-extended formulation, which introduces a single binary variable and sidesteps any additional continuous variables beyond the input and output of the ReLU. Despite necessitating an exponential number of inequality constraints, the authors provide an efficient method to separate these constraints in linear time, showing that these are facet-defining under mild conditions.
  2. An ideal extended formulation, derived using established techniques that include a linear number of additional continuous variables, offering a standard against which the proposed non-extended formulation is measured.

The strength of these formulations was examined within the context of network verification tasks related to image classification networks, specifically using the MNIST dataset. The key findings indicate a significant improvement in computational efficiency for smaller instances, with solve times reduced by a factor of seven in comparison to state-of-the-art MIP solvers. This reduction was realized by dynamically separating the exponential inequalities, thus nearly matching the dual bounds of advanced MIP solvers on more complex problem instances with a drastically reduced time requirement.

Computational Analysis and Results

The MIP formulations were applied to two network architectures—a smaller and a larger neural network. The computational experiments were conducted using the Gurobi solver, evaluating both the solve time and the optimality gap achieved.

For the smaller ReLU network:

  • The combination of the big-MM formulation and the novel constraints (\eqref{eqn:ideal-single-relu-2}) proved to be highly effective, with the proposed approach showing a sevenfold improvement in performance.
  • The big-MM formulation, when enhanced with these additional cuts, enabled substantial improvement over Gurobi's default cutting plane strategy.

For the larger network:

  • Evaluation at the root node compared the initial dual bounds across methods, revealing that the ideal formulation offers substantial improvements in dual bounds in markedly reduced times.
  • Larger networks showed the limitations of the ideal extended formulation due to its increased computational overhead from additional variables, validating the benefit of the non-extended formulation particularly for scalability.

Implications and Future Directions

This work has significant implications for optimization tasks involving neural networks, particularly in applications like system verification where rigorous solution bounds are critical. The ability to model neural networks using fewer variables and constraints without sacrificing the tightness of the formulation allows for solving larger instances more efficiently, which is crucial as neural network architectures grow in complexity.

Future research could expand on these findings by exploring further refinements to the separation routines for inequality constraints, enhancing solver integration, and extending the approach to broader classes of activation functions and network architectures. Additionally, there is potential for employing these MIP formulations in the domain of prescriptive analytics, where the bridging of predictive and optimization models remains a prominent challenge.

Overall, this paper advances the discourse on integrating optimization strategies with neural network models, paving the way for their more effective application in complex real-world tasks that require both predictive accuracy and optimally guaranteed solutions.