Learned Configurator: Adaptive Tuning

Updated 25 June 2026

Learned configurators are automated systems that use data-driven surrogates to select optimal parameters in complex, high-dimensional search spaces.
They integrate statistical, reinforcement, and meta-learning approaches to efficiently handle constrained optimization and runtime trade-offs.
Applications in robotics, solver customization, and LLM self-regulation demonstrate significant improvements in efficiency and performance over conventional methods.

A learned configurator is an automated, data-driven mechanism for selecting optimal configuration parameters in high-dimensional and often combinatorial search spaces, particularly for software systems, optimization solvers, planning modules, or learning environments. These configurators leverage statistical or machine learning surrogates trained on observational or synthetic performance data, and frequently integrate with optimization or decision-theoretic frameworks to search the configuration space efficiently and adaptively. The learned configurator paradigm has been influential in settings ranging from robotic planning (Denniston et al., 2023), meta-optimization for black-box methods (Guo et al., 2024), solver customization (Iommazzo et al., 2024, Lawless et al., 2024), agentic LLM self-regulation (Deng et al., 21 May 2026), and constructive preference elicitation (Dragone et al., 2017).

1. Formal Problem Definitions and Foundations

The underlying abstraction for a learned configurator consists of a (potentially constrained) configuration space $C$ —a Cartesian product of Boolean, categorical, and continuous parameters—together with a performance metric $f: C \rightarrow \mathbb{R}$ , typically black-box, noisy, or expensive to evaluate. Letting $S \subset C$ with $|S| \ll |C|$ be the set of configurations for which empirical measurements $y_i=f(c_i)$ are available, the learned configurator fits a surrogate $\hat{f}(c)$ , and selects $c^* = \arg\max_{c \in C,\; c \textrm{~valid}} \hat{f}(c)$ (or $\arg\min$ where appropriate) (Pereira et al., 2019, Denniston et al., 2023).

For instance, in robotic information gathering, the action space of the configurator consists directly of planner hyperparameters (number of candidates $N$ , horizon $H$ , resolution $f: C \rightarrow \mathbb{R}$ 0, risk–reward weight $f: C \rightarrow \mathbb{R}$ 1, replanning frequency $f: C \rightarrow \mathbb{R}$ 2), with transitions defined by effects on robot belief $f: C \rightarrow \mathbb{R}$ 3 and state $f: C \rightarrow \mathbb{R}$ 4, and reward defined as the difference in expected information gain minus trajectory cost (Denniston et al., 2023).

Various generalizations exist: learned configurators may address parameter selection at every decision juncture (per-iteration adaptivity), enforce inter-parameter constraints (e.g., for solver dependencies (Iommazzo et al., 2024)), use environment or instance features as input, or operate as part of a model-policy joint optimization, as in configurable MDPs (Metelli et al., 2018).

2. Model Architectures and Learning Frameworks

Learned configurators instantiate a diverse range of model and learning architectures, chosen to reflect the combinatorial structure and data domain:

Statistical Surrogates: Linear/logistic regression (Iommazzo et al., 2024), CART/regression trees, SVM/SVR with RBF kernels (Iommazzo et al., 2024), GPs (Kriging), random forests, and neural networks (MLP, deep architectures) (Pereira et al., 2019).
Reinforcement Learning: Policy parameterizations (e.g., PPO, GRPO) select parameter assignments in an MDP or meta-MDP, rewarding downstream performance (reward increase, reduction in cost, or domain-specific metrics) (Denniston et al., 2023, Guo et al., 2024, Deng et al., 21 May 2026).
Transformers and Meta-Learning: Modular EA configurators employ Transformer-based policies over sequences of algorithmic submodules and population descriptors, using multitask/proximal policy optimization (Guo et al., 2024).
LLM-based Configuration: Recent advances include LLMs prompted by instance-level textual and LaTeX features, with prompt-templated selection and clustering/ensembling for combinatorial module settings (e.g., MILP separator configuration) (Lawless et al., 2024).

Training is supervised, reinforcement-based, or hybrid, with objectives driven by empirical performance, surrogate accuracy, regret, or discounted reward sums (Denniston et al., 2023, Metelli et al., 2018, Deng et al., 21 May 2026).

3. Optimization and Search in Constrained Spaces

Once $f: C \rightarrow \mathbb{R}$ 5 is established, learned configurators employ search strategies appropriate for the configuration constraints and model structure:

Mixed-Integer (Non)Linear Programming: Embedding the surrogate into a MINLP enables exact or approximate selection subject to logical, mutual-exclusion, and dependency constraints (e.g., $f: C \rightarrow \mathbb{R}$ 6, $f: C \rightarrow \mathbb{R}$ 7) (Iommazzo et al., 2024, Iommazzo et al., 2024). This exact embedding approach handles nonconvexity and dependencies directly, at the cost of increased computational effort.
Meta-level MDP Solution: In modular or agentic settings, the configurator acts as a policy, selecting parameter vectors $f: C \rightarrow \mathbb{R}$ 8 at each episode (EA configuration, planner parameters, etc.), where return is determined by future cumulative rewards (Denniston et al., 2023, Guo et al., 2024).
Transfer and Active Learning: For performance prediction and optimization under data constraints, transfer learning (bellwether environments, as in BEETLE (Krishna et al., 2019)) or active-sampling strategies minimize new evaluations while maintaining accuracy.
Partitioned/Decomposed Search: For constructive preference elicitation and large combinatorial decision spaces, decomposition into parts enables local optimization and feedback (part-wise Coactive Learning) (Dragone et al., 2017).

4. Application Domains and Empirical Performance

Learned configurators have been evaluated in diverse high-dimensional automation contexts:

Robotics: Dynamic planner parameter selection in field robotics, outperforming fixed configurations and direct end-to-end RL. In (Denniston et al., 2023), a Proximal Policy Optimization agent using compact state embeddings achieved 9.5% better information-gathering per unit time and 7% reduced traveled distance over baselines, with successful sim-to-real deployment.
Optimization Solvers: Instance-wise configuration of mathematical programming solvers (e.g., CPLEX, SCIP) using ML surrogates and MINLP selection yielded improvements over defaults—in (Iommazzo et al., 2024), the approach achieved an order-of-magnitude smaller primal/dual gaps and substantial improvement rates (e.g., up to 3.4×10¹⁴ in integrality gap reduction), at the expense of upfront dataset curation and solver optimization times.
Meta-Optimization and EAs: Modular, meta-learned configurators for EAs (ConfigX) reached robust zero-shot performance and rapid fine-tuning on new tasks. For instance, ConfigX outperformed both SMAC3 and default hyperparameters on BBOB and real-world problems, with mean normalized improvements of ≈0.98 (Guo et al., 2024).
LLM-based Cold-Start Configuration: LLM-powered configuration for MILP separator selection outperformed user defaults with no up-front solver tuning, achieving up to 72% relative runtime improvement in MaxCut and 53% in Middle-Mile tasks with minimal validation overhead (Lawless et al., 2024).
Agentic Systems: In LLM reasoning agents, a learned configurator (System III) within the LLM determined whether and how far to plan, increasing mean planning horizons by 22.8% and reducing reasoning tokens by 25.8–95.3% relative to non-configured baselines without sacrificing answer accuracy (Deng et al., 21 May 2026).

5. Theoretical Guarantees and Analysis

Several classes of learned configurators admit rigorous performance analyses:

Regret bounds: For constructive preference elicitation, part-wise Coactive Learning provides $f: C \rightarrow \mathbb{R}$ 9 bounds on average conditional regret per user interaction, with local optimality guarantees once all decomposed parts are solved (Dragone et al., 2017).
Safe Improvement: Configurable MDPs with SPMI feature monotonic improvement guarantees via lower-bound maximization, converging to local joint optima over policies and environmental configurations (Metelli et al., 2018).
Surrogate Optimization Quality: For the SVR-based solver configurator, empirical suboptimality in MINLP solver runs remains below $S \subset C$ 0, with 83-93% optimality with respect to surrogate predictions (Iommazzo et al., 2024); logistic regression-based selector achieves statistically significant mean gains but is limited in capturing nonlinear dependencies (Iommazzo et al., 2024).

6. Methodological and Practical Considerations

Configuration learning systems trade off between generality, computational cost, and operational constraints:

Sampling and Measurement: Expensive ground-truth measurements motivate sophisticated sampling regimes—Latin hypercube, t-wise combinatorics, uncertainty-driven adaptive queries—to minimize the number of required evaluations while capturing configuration-performance relationships (Pereira et al., 2019).
Constraint Handling: Accurately encoding solver parameter dependencies (e.g., if-then, mutual-exclusion) into the feasible configuration set is essential for correctness; mathematical programs enable exact, explicit constraint handling (Iommazzo et al., 2024, Iommazzo et al., 2024).
Simulation-to-Real Transfer: For physical deployments (robotics, agentic systems), domain randomization and robustification are required to bridge observed and simulated discrepancies (Denniston et al., 2023).
Interpretability: Statistical surrogates such as regression trees and sparse linear models permit performance interpretation, supporting manual trust and validation (Pereira et al., 2019).
Runtime Overhead: Frameworks vary from cold-start, low-latency (LLM configuration: <1s per candidate) (Lawless et al., 2024) to high-throughput dataset preparation and MINLP optimization (e.g., several seconds to minutes per instance) (Iommazzo et al., 2024, Iommazzo et al., 2024), balancing startup cost against long-term adaptive benefits.

7. Limitations and Future Directions

Learned configurators face several open challenges:

Scalability: Training data acquisition for high-dimensional solver parameter spaces or exhaustive instance-configuration pairs is expensive (Iommazzo et al., 2024). Surrogate-based and transfer learning approaches partly alleviate this but require further development for ultra-large combinatorial domains.
Nonconvexity and Surrogate Misspecification: MINLP objectives based on RBF-SVRs and logistic models may induce optimization difficulties, and surrogate inaccuracies can yield infeasible or suboptimal configurations (Iommazzo et al., 2024).
Generalization: Meta-learned and transformer-based configurators (ConfigX) achieve notable zero-shot performance, but cross-family or highly out-of-distribution tasks remain nontrivial, motivating research in graph-based representations and mixture-of-experts approaches (Guo et al., 2024).
Integration with Human Feedback or Preference Elicitation: For problems involving implicit objectives (user preferences or constraints), decomposed elicitation (pcl) (Dragone et al., 2017) and coactive constructs offer tractable, locally-optimal solutions, but their extension to high-dimensional, real-world interactive systems is ongoing.

Future work involves tighter integration of active learning, richer surrogate model classes (deep architectures, GNNs), online learning for configuration policies, and expanded theoretical analysis for safety, robustness, and global optimality.

Key References: