Mesh Adaptive Direct Search (MADS)

Updated 10 November 2025

Mesh Adaptive Direct Search (MADS) is a derivative-free optimization method that systematically refines a discrete grid (mesh) to tackle expensive, noisy or non-differentiable black-box problems.
It employs a structured search and poll phase using positive spanning sets with adaptive grid updates, ensuring convergence to Clarke-stationary points even under constraints.
Variants like Ortho-MADS and StoMADS-PB enhance performance in hyperparameter tuning and stochastic settings by integrating surrogate models, hybrid search strategies, and specialized constraint handling.

Mesh Adaptive Direct Search (MADS) is a rigorous derivative-free optimization (DFO) framework for solving black-box problems where functions are costly, noisy or non-differentiable, and constraints may be present or themselves black-box. MADS operates on a sequence of adaptive, discrete grids ('meshes') and employs a structured pattern of candidate evaluations, systematically refining the search around promising areas while maintaining strong theoretical convergence guarantees in both deterministic and stochastic settings.

1. Algorithmic Framework and Mathematical Structure

MADS addresses the generic DFO problem: $\min_{x \in \Omega} f(x)$ where $f: \mathbb{R}^n \to \mathbb{R}$ is a black-box function and $\Omega \subseteq \mathbb{R}^n$ encodes feasible solutions. At each iteration $k$ , MADS maintains a mesh

$\mathcal{M}_k = \{ x^k + \Delta^m_k D^k z : z \in \mathbb{Z}^p \}$

with:

$x^k$ : current iterate,
$\Delta^m_k > 0$ : mesh size,
$D^k$ : generating matrix with columns forming a positive spanning set.

Algorithm progression involves two phases:

Search (optional): arbitrary trial points on $\mathcal{M}_k$ suggested by heuristics or surrogate models are evaluated.
Poll (mandatory): a finite set of trial points close to $x^k$ (usually $\{x^k + \Delta^p_k d_i\}$ where $d_i$ are positive spanning directions) is evaluated.

Upon finding an improved solution, the mesh is refined ( $\Delta^m_{k+1} = \tau \Delta^m_k$ , $\tau>1$ ) or contracted ( $\Delta^m_{k+1} = \tau^{-1} \Delta^m_k$ , $\tau>1$ ) in the case of success or failure, respectively. As $\Delta^m_k \to 0$ , the union of used directions becomes dense on the unit sphere, essential for theoretical guarantees (Lakhmiri et al., 2019).

Table 1: MADS Iteration Components

Step	Description	Typical Operation
Mesh	Discrete grid for candidate generation	$\mathcal{M}_k = \{x^k + \Delta^m_k D^k z \}$
Search	User-defined exploration on the mesh	Surrogates, random sampling, heuristics
Poll	Systematic trial in positive spanning directions	Evaluate $x^k + \Delta^p_k d_i$
Mesh update	Refine (success) or contract (failure) search granularity	$\Delta^m_{k+1}$ adaptive

2. Key Theoretical Properties

The mesh and poll-step structure of MADS ensures that, under mild regularity conditions (e.g., $f$ Lipschitz continuous, directions becoming asymptotically dense), every accumulation point is Clarke-stationary: $0 \in \partial f(\bar{x}) + N_\Omega(\bar{x})$ where $\partial$ is Clarke's subdifferential and $N_\Omega$ the normal cone (Lakhmiri et al., 2019).

For constrained or stochastic optimization, as in StoMADS-PB, probabilistic bounds and progressive barrier mechanisms extend these guarantees almost surely by adapting Lyapunov-type and martingale arguments (Dzahini et al., 2020). In the stochastic subspace setting (StoDARS), first- and second-order convergence to Clarke-stationary points is established via generalized Hessian constructions, leveraging Johnson–Lindenstrauss projections and subspace randomization (Dzahini et al., 20 Mar 2024).

3. Algorithmic Enhancements and Variants

Several MADS variants and enhancements address domain-specific challenges:

Ortho-MADS: Poll directions are constructed via deterministic orthogonal bases (e.g., Householder reflections), rotating through orientations to improve coverage of the search sphere and ensure positive spanning at each iteration (Mello et al., 2019).
Categorical Handling (HyperNOMAD): Mixed-type hyperparameter blocks (categorical and continuous) are integrated via ad-hoc neighborhood moves, such as adding/removing network layers or changing optimizers. Polling around current categorical choices explores discrete architectural modifications within the same convergence theory (Lakhmiri et al., 2019).
Hybrid Search Strategies: Integration of Nelder–Mead local search and Variable Neighborhood Search grants escape from local optima and improves convergence rates, especially for hyperparameter optimization in SVMs (Mello et al., 2019).
Bilevel and Black-box Constraint Handling: In bilevel contexts, MADS employs inexact lower-level solvers with error control, maintaining stationary point guarantees with explicit relationship between lower-level precision and achieved optimality (Diouane et al., 2023).
Pareto-based Filtering: For multi-objective/discrete optimization (e.g., feeder reconfiguration), Pareto filters prune dominated solutions and local polling refines the non-dominated frontier efficiently with minimal simulation calls (Zheng et al., 21 Jul 2025).

4. Representative Applications and Empirical Results

MADS has been applied to diverse domains, particularly where black-box evaluations are expensive:

Hyperparameter and Neural Architecture Optimization

On MNIST and CIFAR-10 with a 100-evaluation budget:
- HyperNOMAD achieved test accuracies: MNIST $\approx 99.61\%$ , CIFAR-10 up to $77.6\%$ from default $28.3\%$ , surpassing Random Search and Bayesian TPE, especially in feasibility-constrained architectures (Lakhmiri et al., 2019).
On ResNet compression:
- Out of 25 architectures on ImageNet, NOMAD enabled $>25\%$ MACs reduction with negligible accuracy loss, outperforming TPE (Lakhmiri et al., 2023).

Support Vector Machine (SVM) Hyperparameter Tuning

Ortho-MADS with dynamics stopping, VNS, and NM hybridization reached or exceeded the best known validation accuracies for 13 UCI datasets using under 100 evaluations, often escaping false local minima (Mello et al., 2019).

Stochastic/Constrained Optimization

StoMADS-PB invoked in noisy, constrained black-boxes admits intermediate infeasible iterates, leverages sample-based confidence bounds, and achieves Clarke-stationarity almost surely (Dzahini et al., 2020).
In large-scale stochastic settings, StoDARS achieves $O(\epsilon^{-2})$ expected complexity for finding first-order stationary points, matching established trust-region methods up to constants, with full second-order extension under $C^{1,1}$ smoothness (Dzahini et al., 20 Mar 2024).

Engineering and Power Systems

In black-box feeder reconfiguration, MADS with bi-objective Pareto filtering realized near-optimal solutions on the IEEE-123 node test feeder with $\sim 10$ simulation calls (vs. $100+$ for heuristics), demonstrating high empirical efficiency in discrete, combinatorial environments (Zheng et al., 21 Jul 2025).

Model Predictive Control with Non-Smooth Costs

Embedding cost and path-constraint integrals as ODE states enables MADS to directly optimize non-differentiable system trajectories with constraints, as shown for robust rocket apogee control (McInerney et al., 2021).

5. Convergence Analysis and Complexity

MADS convergence is grounded in the density of poll directions and systematic mesh refinement. Under bounded level-set and positive spanning hypotheses, vanishing mesh size implies no descent can be found in any direction, establishing first-order stationarity (or Clarke-criticality in nonsmooth cases). Extensions (e.g., StoDARS) leverage generalized Hessian constructions for second-order analysis, yielding the necessary optimality conditions and expected complexity bounds:

Deterministic/stochastic MADS: $O(\epsilon^{-2})$ expected function evaluations to reach $\|\nabla f(x)\|\leq\epsilon$ in the $C^1$ case (Dzahini et al., 20 Mar 2024).
Bilevel adaptation: To achieve $\|\nabla F(x)\|\leq\epsilon$ , lower-level oracle precision should scale as $\mathcal{O}(\epsilon^2)$ (Diouane et al., 2023).

6. Implementation Considerations and Practical Guidance

Efficient application of MADS relies on:

Careful tuning of initial mesh sizes and contraction/expansion factors, often problem-specific.
Early stopping via the mesh becoming sufficiently fine, rather than fixed evaluation budgets, prevents wasteful allocations in unpromising regions.
For categorical or mixed-variable search spaces, ad-hoc neighborhood construction is crucial to preserve convergence properties (Lakhmiri et al., 2019).
Opportunistic evaluation (stop poll as soon as an improving point is found) reduces evaluation cost.
In stochastic/black-box constraints, integration of sample-based bounds and progressive barrier methods enables robust performance with partial, noisy observations (Dzahini et al., 2020).

A plausible implication is that the structural guarantees and adaptability of MADS make it especially suitable for applications with high evaluation cost, non-differentiability, and combinatorial or mixed search spaces.

7. Extensions, Limitations, and Research Directions

MADS continues to be extended:

Random-subspace and high-dimensional variants (StoDARS) support scalability to $n \gtrsim 10^4$ dimensions with random Johnson–Lindenstrauss projections (Dzahini et al., 20 Mar 2024).
Bilevel optimization variants address complex hierarchical applications with controlled lower-level oracle accuracy (Diouane et al., 2023).
Integration with advanced surrogate or meta-model strategies inside the optional search phase (e.g., using TPE or other Bayesian optimization as search) is a promising direction.

Limitations include:

Sequential nature may be less immediately suited to embarrassingly parallel or massively scalable contexts unless the poll/search evaluations are parallelized.
Convergence rates may slow in high-noise settings if sample-based estimation is not carefully managed.
Exploiting problem structure (e.g., via custom poll directions or hybridization with local solvers) remains a key area for domain-specific tuning.

Continued research includes augmentation for distributed infrastructure, real-time control, richer mixed-variable encoding, and theoretical investigations of convergence rates under weaker regularity and high-dimensionality.