LLM-Driven Automated Strategy Finding

Updated 21 September 2025

The paper demonstrates recursive logic-based synthesis where LLMs decompose goals into subgoals and iteratively validate candidate strategies.
It highlights multi-agent pipelines that generate, execute, optimize, and evaluate strategies, improving accuracy by up to 55% in specialized tasks.
Hybrid techniques blend neural, symbolic, and search-based methods to automate strategy discovery across finance, circuit design, and planning domains.

Automate Strategy Finding with LLMs

Automating strategy finding with LLMs denotes a class of computational architectures and algorithmic frameworks where LLMs autonomously generate, refine, validate, and select problem-solving strategies for complex, goal-driven tasks. Modern approaches encode strategies in forms ranging from natural language and pseudocode to executable rules and decision trees, across application domains as varied as multi-agent planning, mathematical reasoning, game theory, code optimization, and quantitative finance. The defining characteristic is the full or partial automation of the strategy discovery process—the system traverses a space of possible strategies with minimal human intervention, employing the LLM’s capabilities for recursive reasoning, inductive generalization, deductive application, explicit program synthesis, and self-consistent evaluation.

1. Recursive Exploration and Logic-Guided Strategy Synthesis

The recursive expansion paradigm is foundational to automated strategy finding. Here, the reasoning process is modeled as a dialogue thread or logical program that alternates between exploring alternatives (“OR-nodes”—representing branching among plausible strategies) and decomposing goals into subgoals (“AND-nodes”—conjunctive dependencies). The workflow mirrors the operation of a Horn Clause interpreter as found in Prolog, but adapted to the natural language setting of LLMs. A goal $h$ is reduced to subgoals $b_1, b_2, \ldots, b_n$ , forming a dynamically synthesized recursive descent, expressed as: $h \, :-\, b_1,\, b_2,\, \ldots,\, b_n.$ At each recursion, the system evaluates alternative strategies (“clauses”) with the LLM generating candidate heads and bodies on the fly. This dialogic recursion is initiated with a succinct, task-specific prompt (“initiator”), and the LLM maintains focus by continuously synthesizing context-aware prompts that summarize the reasoning path taken so far. Semantic similarity to ground-truth or task-relevant facts, as well as validation from “refiner oracles” (secondary classifiers or LLM instances), restrict the search space and validate or prune candidate strategies. Ultimately, the set of recursive justifications is compiled into a unique minimal model—an explicit Horn clause program whose fixed-point semantics minimally explain the original task (Tarau, 2023).

State-of-the-art frameworks for automating strategy discovery decompose the process into agentic subproblems handled by distinct LLM-driven components. In StrategyLLM, four roles—strategy generator, executor, optimizer, and evaluator—interact in an iterative, feedback-driven pipeline (Gao et al., 2023):

Strategy Generator: Induces generalizable strategy candidates from a small set of instance-defining examples and a formal task specification, employing temperature sampling for diversity.
Strategy Executor: Applies each candidate strategy across examples, producing detailed reasoning traces and calculating execution accuracy.
Strategy Optimizer: Analyzes mismatches between predicted and gold-standard outcomes, provides diagnostic feedback, and refines candidate strategies.
Strategy Evaluator: Aggregates validation results to select or ensemble the top-performing strategies, either by majority voting (StrategyLLM-SC) or directly via LLM zero-shot inference (StrategyLLM-ZS).

This multi-phase agent pipeline enables integration of both inductive (generalization from examples) and deductive (uniform application to new data) reasoning. Experimental results show significant improvements over instance-specific chain-of-thought prompting, with accuracy gains up to 13.4% on math reasoning and 55% absolute jump on symbolic reasoning tasks, all without human-annotated demonstrations.

3. Structured Representations and Visual Coordination in Multi-Agent Systems

Ambiguities inherent in natural language, especially in multi-agent coordination, motivate structured representations that regularize strategy specification. The AgentCoord framework introduces a bipartite graph-based schema: Plan Outline (high-level sequence), Task (atomic actions specified by key objects, inputs, and outputs), Key Object, Agent, and Action. Strategy generation proceeds in three LLM-mediated stages: plan outline synthesis, agent assignment by reviewing agent profiles, and stepwise generation of detailed collaborative process flows (Pan et al., 18 Apr 2024). Human users can intervene interactively to edit, branch, or refine the hierarchy at any stage, supported by visual exploration interfaces (plan outline views, agent assignment heatmaps, action-flow templates). User studies confirm that visualization of structured representations and staged, LLM-guided exploration reduces cognitive load and facilitates both comprehension and iterative improvement of complex coordination strategies.

4. Integration with Symbolic, Heuristic, and Search-Based Methods

LLM-based automation pipelines increasingly hybridize neural and symbolic reasoning methods. The Strategist framework demonstrates a bi-level tree search for discovering and refining strategic skills in multi-agent game domains (Light et al., 20 Aug 2024). High-level LLM-generated strategy abstractions (e.g., value heuristic functions $v: S \rightarrow \mathbb{R}^{|N|}$ for state space $S$ and $N$ agents) are mapped into executables for guiding low-level Monte Carlo Tree Search (MCTS). An evolutionary loop—alternating between self-play, LLM-based reflection on trajectory discrepancies, and heuristic revision—enables the agent to autonomously optimize strategies, outperforming standard RL and vanilla LLM-based methods by substantial margins in games like GOPS and The Resistance: Avalon.

In complex planning settings, Automated Heuristics Discovery (AutoHD) prompts the LLM to generate Python-encoded heuristic functions $H(s)$ for state evaluation, which are iteratively evolved and selected based on validation performance. The selected explicit heuristics guide A*- or Greedy-BFS search ( $F(s) = G(s) + H(s)$ ), providing interpretability and robust improvements (sometimes doubling accuracy over baselines) (Ling et al., 26 Feb 2025).

5. Application Domains and Empirical Validations

Automated LLM-driven strategy finding now appears across diverse domains:

Quantitative Finance: A three-stage, risk-aware multi-agent system integrates prompt-engineered LLMs (for alpha factor mining), multimodal agent-based evaluation, and dynamic DNN-based weight optimization. The composite strategy achieved a 53.17% cumulative return on SSE50 over a 12-month period, versus −11.73% for the underlying index (Kou et al., 10 Sep 2024).
Analog Circuit Design: ADO-LLM combines LLMs (for knowledge-infused design candidate generation) and Bayesian Optimization (GP surrogate models for exploration/exploitation). The closed-loop system demonstrates reduced cost and higher Figure of Merit (e.g., 3.52 for a two-stage differential amplifier) with fewer iterations compared to standalone BO or LLM agents (Yin et al., 26 Jun 2024).
Generalized Planning: By separating the strategy generation phase into pseudocode synthesis (with debugging and reflection loops), subsequent code implementation, and multi-variant selection, LLMs robustly generate generalized plans in 17 PDDL domains, solving all tasks in 12 of them and achieving polynomial runtime speedups (Stein et al., 19 Aug 2025).
Game Theory and Formalization: GAMA autoformalizes natural language game-theoretic scenarios and strategies into executable logic programs (Prolog modules), validating them both syntactically and semantically via solver-based checks and tournament simulation, reporting up to 100% syntactic and 87% semantic correctness for some models (Mensfelt et al., 11 Dec 2024).

6. Search Space Restriction, Validation, and Result Aggregation

Throughout these frameworks, strict validation and search space restriction mechanisms are key to feasible automatic strategy discovery. Employing semantic similarity (embedding distances, similarity scores sim( $g$ , $r$ )) to filter distractor strategies, using oracles (LLMs or classifiers) for subgoal validation, and aggregating results only when justified by recursive expansion ensure both soundness and efficiency. Final strategy selection often compiles all justified traces into a unique minimal model (Horn clause program), the “closure” under minimal explanation semantics, or aggregates top-performing candidates based on formal or empirical performance scores.

In prompting optimization (e.g., HPSS), a heuristic advantage score $A_{ij}$ is updated iteratively for each prompt-factor value $f_{ij}$ , guiding exploration with UCB-inspired softmax sampling, balancing exploitation of high-performing strategies and discovery of underexplored configurations (Wen et al., 18 Feb 2025). This systematically drives the search toward strategies that better align with human or task-specific metrics.

7. Challenges, Theoretical Foundations, and Future Directions

Current challenges include dependence on closed-source LLMs (affecting reproducibility and model transferability), increased inference and validation costs due to longer and more complex strategies, and the need for richer integration with external information or structured priors. The No-Free-Lunch theorem underpins frameworks such as Know-the-Ropes (KtR), justifying domain-aware, controller-mediated decompositions over universal “one-prompt-fits-all” schemes, and enabling modular, easily-augmented agent systems that generalize across task instances (Li et al., 22 May 2025).

Future research directions focus on improving cross-model strategy transfer, automating fine-grained agent decomposition and augmentation, integrating memory and persistent belief modules for richer temporal reasoning, and scaling up to less-structured, open-domain or multimodal environments. Hybrid solutions that blend accessible, rapid development (via low-code toolchains) with componentwise, custom-coded tuning also promise greater flexibility in automating strategy discovery at scale (Mehta et al., 28 Aug 2025).

Automating strategy finding with LLMs involves recursively structured, context-aware, and validation-driven pipelines that can reason, generate, refine, and evaluate strategies across tasks of increasing complexity and scale. The emerging frameworks—spanning recursive logic programs, agentic multi-phase pipelines, hybrid neural-symbolic integrations, and domain-aware decompositional hierarchies—demonstrate measurable improvements and highlight central directions for robust, interpretable, and broadly applicable automated strategy discovery.