LLM-Guided Optimization

Updated 2 July 2026

LLM-GO is a computational framework that melds large language models with optimization workflows through error correction, iterative self-refinement, and multi-agent architectures.
The methodology employs multi-turn inference with solver feedback and structured prompt templates, achieving gains like +14 MILP accuracy points and 31× speedups.
Its modular pipeline, featuring scalable divide-optimize-merge and model patching, enables rapid adaptation and enhanced performance across domains such as supply-chain and chemical optimization.

LLM-guided optimization (LLM-GO) encompasses a family of computational paradigms in which LLMs are systematically integrated into the formulation, solution, refinement, or management of optimization problems, often acting as domain-aware agents or meta-optimizers. LLM-GO frameworks combine LLMs’ abilities in semantic understanding, symbolic reasoning, error analysis, and inductive generalization with conventional or agentic optimization workflows across domains such as mathematical programming, agent pipeline configuration, multi-agent process optimization, and scientific discovery. The central principles span data cleaning with error-taxonomy, domain-informed prompting, iterative self-correction via solver feedback, structured patching and re-optimization, scalable divide-optimize-merge strategies, and agent hierarchies spanning constraint discovery to solution validation. LLM-GO has demonstrated substantial empirical gains in solution quality, efficiency, and generalizability over both LLM-agnostic and LLM-naive baselines, across industrial benchmarks and scientific settings.

1. Core Principles and Error-Aware Data Preprocessing

LLM-GO pipelines frequently begin with expert-driven data cleaning and training set curation, addressing the noise and ambiguity endemic to crowdsourced or legacy datasets. In mixed-integer linear programming (MILP) problem formulation, for example, the OptiMind system introduces two precision-critical stages: (a) test-set cleaning with manual correction or removal of out-of-scope, ambiguous, or infeasible instances, and (b) training-set cleaning by class-based error analysis, using disagreement between strong LLMs and the reference to compile “error summaries” and “preventive hints” for each problem class (e.g., Knapsack, Flow-Shop, TSP) (Chen et al., 26 Sep 2025).

This process produces a taxonomy of common formulation errors per class, and enables aggregation into a class→hints dictionary. Cleaned datasets from this process empirically increase zero-shot LLM formulation accuracy from roughly 40–60% to 70–90% across benchmarks such as IndustryOR and Mamo-Complex. The paradigm generalizes: LLM-GO is maximally effective only when supported by domain-expert curation pipelines that inject knowledge-rich, class-specific error descriptors during both training and test-time inference.

A hallmark of advanced LLM-GO instantiations is multi-turn inference: an iterative loop combining majority-vote “self-consistency,” domain-hint injection, and direct solver feedback to guide the LLM through error detection and self-correction (Chen et al., 26 Sep 2025). Typical pipelines involve:

Classification prompts to determine problem class.
Reasoning-augmented generation, optionally injecting class-specific error hints.
Majority-vote selection after parallel sampling (K generations).
Feedback round in which the LLM receives execution (stdout, stderr) from a solver or validator, and is instructed to analyze failures and propose corrections.

All stages are governed by structured prompt templates. Empirically, multi-turn self-correction (M=5) in OptiMind yields monotonic accuracy improvements, with single-step, hints, and majority voting cumulatively adding up to ~24 percentage points over baseline, with further robustification from multi-turn refinement.

3. Agentic and Multi-Agent Architectures

LLM-GO generalizes beyond single-agent pipelines to multi-agent frameworks for tasks such as chemical process optimization (Zeng et al., 26 Jun 2025). Here, domain-resident ContextAgent modules autonomously infer operating constraints (using embedded “rules-of-thumb”), which are then enforced by ValidationAgent and explored by SuggestionAgent via gradient-free, reasoning-driven search. SimulationAgent modules execute parameter trials and report metrics, creating a closed agentic loop.

The architecture enables efficient exploration even when operational bounds are unknown or ill-defined, with LLM-inferred constraints automatically restricting the search and domain heuristics (e.g. process engineering “soft rules”) guiding the trajectory. On the HDA process benchmark, this framework requires 3–4× fewer iterations to converge than conventional solvers, and achieves a 31× wall-clock speedup over grid search.

4. Scalable Divide-Optimize-Merge and Modular Composition

As LLM-driven pipelines scale to large datasets or lengthy optimization traces, direct prompt-optimization over all available data collapses under context window limits, leading to performance plateaus. Fine-Grained Optimization (FGO) remedies this by splitting data into k context-fit subsets, independently optimizing with LLMs, then recursively merging specialized modules via LLM prompts (Liu et al., 6 May 2025).

Formally, for subsets $\mathcal{D}_i$ , FGO optimizes module parameters $\theta_i^*$ with textual feedback, then merges modules through learned, validation-weighted combinations. FGO yields consistent 1.6–8.6% success-rate improvements and 56.3% reduction in prompt token consumption relative to large-batch or baseline schemes, and supports efficient parallelization and O(log log k) scaling of merge depth.

5. Patching, Re-Optimization, and Model Adaptation

A critical application of LLM-GO is model re-optimization under evolving constraints or data. Structured “model patches,” formalized as sequences of operations on model parameters and constraints, are generated via LLM parsing of natural-language user prompts (Ye et al., 18 May 2026). The Patch Planner LLM emits candidate patch sequences, which a programmatic “Programmer” standardizes, and a Strategy Selector chooses appropriate re-optimization tools (e.g., warm starts, valid inequalities, metaheuristics).

Empirical evaluations on real-world supply-chain and scheduling benchmarks demonstrate that the LLM-GO patching framework achieves 100% update correctness with state-of-the-art prompt satisfaction, halving solution time and doubling fulfillment rates compared to direct code editing or non-agentic patching. Structured patching with toolbox stratification ensures semantic control, interpretability, and rapid adaptation without deep OR expertise.

LLM-GO systems extend naturally to evolutionary and agentic search paradigms. In iterative LLM-guided evolutionary optimization, strong models act as local refiners—producing offspring that frequently, incrementally improve upon parents, and localizing search in semantic embedding space (Zhang et al., 21 Apr 2026). Trajectory analyses reveal that breakthrough rate (fraction of generations with objective improvement) is a stronger predictor of final performance than zero-shot model ability.

Design recommendations include empirically maximizing local refinement rate and tuning mutation operators for stable parent–child distances, rather than indiscriminate novelty. These insights unify evolutionary LLM-GO with other agentic and feedback-driven architectures.

7. Domain-Specific Applications and Empirical Impacts

LLM-GO has demonstrated proven gains across domains:

Mathematical programming formulation: +14 points in average accuracy for MILP (Chen et al., 26 Sep 2025).
Chemical process optimization: 31× speedup, competitive quality vs. IPOPT/grid search (Zeng et al., 26 Jun 2025).
Agentic system scaling: up to +38% success rates and 56% prompt savings via FGO (Liu et al., 6 May 2025).
Large-scale MIP re-optimization: 100% correctness and rapid solution adaptation (Ye et al., 18 May 2026).
Algorithmic improvement and code refinement: LLM-guided enhancement leads to both quality and runtime wins, even when used by non-experts in combinatorial settings.

The LLM-GO paradigm concretely unites semantic understanding, iterative refinement, multi-agent reasoning, and optimization feedback to deliver robust, interpretable, and scalable optimization in both traditional and emergent computational environments.