Papers
Topics
Authors
Recent
Search
2000 character limit reached

Universal LLM-Based Text Optimization

Updated 20 May 2026
  • Universal LLM-based text optimization is a paradigm that leverages LLMs to search, mutate, and optimize diverse textual artifacts using structured feedback.
  • It employs a cycle of candidate selection, LLM-driven mutation, and Pareto frontier updates to enhance multi-task, single-task, and generalization performance.
  • The approach integrates side information, meta-optimization, and semantic compression to achieve state-of-the-art results and scalable, cross-domain improvements.

Universal LLM-based text optimization is a general problem-solving paradigm in which LLMs are leveraged to search, mutate, and optimize arbitrary text artifacts—ranging from program code and agent blueprints to prompts, specifications, and scheduling algorithms—subject to objective or reward signals computed by black-box evaluators. The universality of this approach derives from recasting classical and domain-specific optimization tasks in a form where candidate solutions are represented as strings and the search for improvements is orchestrated by LLM-driven mechanisms that generate and select new textual candidates based on structured feedback and multi-task transfer principles. Recent frameworks such as "optimize_anything" (Agrawal et al., 19 May 2026), metaTextGrad (Xu et al., 24 May 2025), and associated methods have established that such systems can achieve or exceed domain SOTA across fields traditionally governed by bespoke solvers or manual tuning.

1. Formal Problem Definition and Universal Scope

Let XX denote the set of all possible text artifacts (strings encoding code, prompts, policies, SVG/CAD specifications, etc.), and let f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I be an evaluator mapping a candidate xXx \in X and an optional task identifier eEe \in E to a scalar score s(x,e)Rs(x, e) \in \mathbb{R} and structured side information si(x,e)Isi(x, e) \in I. The universal LLM-based text optimization problem is then to identify

x=arg maxxXSmode(x),x^* = \operatorname*{arg\,max}_{x \in X} S_{\mathrm{mode}}(x),

where the objective SmodeS_{\mathrm{mode}} varies according to the setting:

  • Single-task optimization: maximize s(x)s(x) for a fixed input (no EE).
  • Multi-task optimization: maximize f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I0 over a dataset f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I1.
  • Generalization: train on f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I2, optimize for expected f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I3 over f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I4.

This abstraction enables a single LLM-based system to address problems as diverse as agent skill learning, schedule optimization, prompt refinement, numerical solver generation, and more—all while maintaining a common optimization protocol (Agrawal et al., 19 May 2026).

2. Architectures and Optimization Algorithms

The core architecture, as exemplified by "optimize_anything," employs the following loop:

  • Candidate Selection (Pareto sampling): At each iteration, select from the maintained Pareto frontier, which tracks candidates non-dominated on per-metric or per-task scores.
  • Minibatch Evaluation: Evaluate candidate f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I5 on a small batch f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I6, collecting both scalar scores and side information.
  • LLM-driven Mutation (Reflection & Edit Proposal): Present the LLM (the proposer) with f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I7, observed outcomes on f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I8, and all accompanying side information. The LLM proposes a mutation f:X×ER×If : X \times E \rightarrow \mathbb{R} \times I9.
  • Insertion and Pareto Update: If xXx \in X0 improves on any objective, it is fully evaluated and added to the candidate pool; the Pareto frontier is pruned accordingly.

The algorithm supports three operational modes:

Mode Data Used Pareto Frontier Structure
Single-task none or 1 task Trivial; per-metric if side info exists, else single metric
Multi-task xXx \in X1 xXx \in X2-dimensional; preserves candidates excelling on subsets
Generalization xXx \in X3 Optimize on xXx \in X4, select by xXx \in X5

Multi-task optimization further enables cross-problem transfer: improvement patterns discovered on specific tasks propagate via the Pareto frontier to other tasks, yielding faster convergence and broader solution coverage (Agrawal et al., 19 May 2026).

Meta-optimizers, as in metaTextGrad, extend this framework to optimize over the space of optimizers themselves. Two principal mechanisms are employed (Xu et al., 24 May 2025):

  • Meta Prompt Optimizer: Automatic search over the optimizer’s system prompt for improved outcomes.
  • Meta Structure Optimizer: Learning how to combine or sequence multiple optimizers into a composite process.

Standard empirical risk minimization and validation splits (train/val/test) are used to guide outer-loop meta-optimization, and empirical results confirm both single-task and compositional improvements.

3. Feedback Structures: Score, Side Information, and Directionality

A defining feature of universal LLM-based optimization systems is the use of side information (SI)—structured feedback returned alongside scalar objective scores. SI can include compiler errors, profiler traces, sub-scores, diagnostic traces, rendered outputs, or other high-signal diagnostics.

Empirical ablations (see table below) establish that actionable SI markedly accelerates convergence and improves final outcomes compared to score-only feedback by providing information analogous to gradients in classical optimization (Agrawal et al., 19 May 2026, Nie et al., 2024).

Domain With SI Score-Only
Circle Packing 100% of optimum 93.96%
KernelBench (ST) 32.3% kernels ≥1.1× 12.9%
KernelBench (MT) 40% kernels ≥1.1× 0%

"Directional feedback"—a specific, improvement-oriented suggestion—enables descent-like updates analogous to using a first-order oracle, while non-directional feedback yields only noisy, slow black-box searches (Nie et al., 2024). Systems that synthesize or elicit explicit directional feedback from LLMs achieve more reliable and monotonic performance increases.

4. Empirical Results and Task-General Outcomes

Universal LLM-driven optimization demonstrates substantial gains across diverse domains without the need for task-specific architectures:

Domain Mode Proposer LLM Baseline Final Result (Δ)
ARC-AGI Puzzle Agent G Gemini 3.Flash 32.5% accuracy 89.5% (+57 pp)
Agent Skills (Bleve code) G Claude Opus 4.6 Haiku 4.5: 79.3% 98.3% (+19.0 pp); 47% faster
Cloud Scheduling G Gemini 3.Pro Dijkstra 0% saved 40.2% saved
CUDA Kernel Generation M GPT-5 0% match PyTorch 87% match or beat; 48% ≥10% faster
Circle Packing (n=26) S GPT-5 2.6307 (OpenEv) 2.63598 (world record)
AIME Math Prompts G GPT-5 46.67% 60% (+13.3 pp)

Multi-task search demonstrates strong cross-task transfer scaling: increasing the number of jointly optimized tasks (e.g., from 10 to 20) increases both convergence speed (proportion of tasks solved per iteration) and ultimate coverage, outperforming independent single-task schedules (Agrawal et al., 19 May 2026).

MetaTextGrad delivers average absolute improvements of up to 6 percentage points over the best LLM-optimizer baselines across benchmarks in reasoning, language modeling, and domain-specific QA (Xu et al., 24 May 2025). Its meta-prompt and meta-structure optimizers are each individually beneficial and exhibit additive effects.

5. Compression, Efficiency, and Prompt Token Optimization

Token optimization, as presented in "Hypernym Mercury," is a complementary paradigm focused on reducing LLM prompt length via semantic compression. Text is parsed into “darts” encoding a hypernym-based core and associated details, with details ranked by Shapley-value measures of semantic importance (Forrester et al., 12 May 2025). Iterative removal or abstraction of low-value details achieves compression rates of 80–90% while maintaining high semantic fidelity (cosine similarity xXx \in X60.9 or greater) across LLM and embedding models.

Model Pair Avg CR Avg CosSim Avg ROUGE-L
dolphin-llama3 → llama4-mav 86% 0.92 0.90
gpt4.1 ↔ gpt4.1 88% 0.94 0.92
cross-vendor mixes 83–87% 0.90–0.93 0.88–0.91

Granularity is precisely controlled via an importance threshold, supporting both lossless and lossy operation. Integrated into LLM pipelines, semantic compression provides linear computational savings proportional to the compression ratio, with only minor overhead for reconstruction or multi-model semantic verification.

6. Open-Source Frameworks and Implementation Practices

The "optimize_anything" API, released as part of the GEPA project (Agrawal et al., 19 May 2026), exemplifies state-of-the-art design:

  • Declarative Python interface—no mutation templates or special markers required.
  • Side Information support as a strongly-typed primitive (arbitrary Python objects, JSON, or images).
  • Automatic mode selection based on supplied dataset/validation set arguments.
  • Seedless and seeded operation: optional natural-language objectives allow bootstrapping from scratch.
  • Pareto-based search backend exposed with modular adapter hooks.
  • Efficient execution: most experiments run on commodity CPU with API-hosted LLMs; hardware-intensive tasks (CUDA kernel synthesis) require a single GPU.
  • Extensive test and experiment coverage, with public notebooks and state logs for every evaluated domain.

MetaTextGrad utilizes a hierarchical model selection: lighter-weight (e.g., o1) models at the meta-level, larger models (e.g., GPT-4o) at the optimizer/program level, thereby balancing computational cost and optimization fidelity (Xu et al., 24 May 2025).

7. Theoretical and Practical Implications

Universal LLM-based text optimization reframes a broad class of search and adaptation problems into string optimization with feedback, enabling:

  • Rapid, automated discovery of high-quality solutions across heterogeneous domains, with consistent cross-task improvement through multi-task learning and Pareto-based candidate retention.
  • Principled integration of side information and directional feedback that improves convergence in discrete text spaces, drawing explicit analogies to gradient-based optimization in continuous domains (Nie et al., 2024).
  • Meta-level customization, allowing automatic alignment, prompt-tuning, and optimizer composition without manual intervention and with provable generalization to new tasks when guided by empirical validation (Xu et al., 24 May 2025).
  • Scalability and efficiency via semantic compression, reducing token footprint while maintaining high retrievability and semantic preservation in downstream LLM and RAG pipelines (Forrester et al., 12 May 2025).

The emergence of such universal frameworks challenges traditional boundaries between problem-specific algorithm design and model-agnostic optimization, pointing toward a future of fully automated, cross-domain adaptation and self-improving LLM-driven systems. The open theoretical questions, especially those related to convergence in discrete spaces and optimal synthesis of directional feedback, remain active topics of research.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Universal LLM-based Text Optimization.