Formula-One Prompting (F-1)
- Formula-One Prompting (F-1) is a strategy for LLMs that formalizes governing equations in LaTeX to enhance mathematical reasoning.
- It utilizes a two-phase approach: first extracting symbolic equations and then adaptively choosing a solving method based on equation complexity.
- Experimental results show that F-1 significantly improves accuracy in finance, physics, and cryptography compared to conventional CoT and PoT methods.
Formula-One Prompting (F-1) is a prompting strategy for LLMs designed to improve reasoning in applied mathematics by explicitly formulating governing equations as intermediate representations. Unlike conventional @@@@2@@@@ (CoT) or Program-of-Thought (PoT) prompting, F-1 introduces a two-phase approach: extracting key symbolic equations first, then adaptively selecting a solution strategy—direct substitution, CoT, or PoT—based on the structure of those equations. Experimental results indicate that F-1 delivers significant accuracy gains in domains requiring retrieval or synthesis of mathematical laws, such as finance, physics, and cryptography (Nitarach et al., 27 Jan 2026).
1. Motivation and Background
Traditional prompting methods have shown limitations in domains where mathematical reasoning hinges on recognizing and applying domain-specific governing equations. Chain-of-Thought (CoT) prompting lays out natural language steps but is prone to losing track of crucial domain constraints and often introduces compounding symbolic errors. Program-of-Thought (PoT) prompting, while more precise in numerical calculations, is less effective at expressing high-level mathematical relationships, frequently resulting in verbose or inefficient computational routines even when a closed-form solution exists.
In applied problem settings—such as calculating compound interest, applying physical laws like , or analyzing cryptographic constructs—extracting the salient formula or equation is central to successful reasoning. The F-1 approach leverages this insight by directing LLMs to explicitly formalize the relationships in symbolic LaTeX before any further computation, thus aligning model behavior with expert practices in science and engineering disciplines.
2. Two-Phase F-1 Methodology
Given a problem statement , F-1 guides the model through the following workflow:
where denotes one or more LaTeX-formatted equations and the boxed final answer.
2.1 Phase I: Equation Formulation
The initial phase requires the model to:
- Extract all data “givens”—numerical rates, constants, parameters.
- Identify the target variable (quantity to compute or prove).
- Write key symbolic equations connecting givens to the target, exclusively in LaTeX.
Example (Finance):
Problem: A bank offers 5% annual interest compounded monthly. If the principal is \$1000, find the amount after 2 years.
F-1 Phase 1 output:
Example (Cryptography):
Problem: Prove that is a PRF if and are PRFs.
F-1 Phase 1 output:
2.2 Phase II: Adaptive Solving Strategy
Based on the explicit equations generated in Phase I, the model is instructed to select a solving strategy. The guiding decision rule is empirically determined and operates as follows:
where “” counts arithmetic or code-like operations.
The overall process is encapsulated in the following pseudocode:
1 2 3 4 |
def formula_one_prompt(problem_text): prompt = SYSTEM_PROMPT + "\n" + USER_TEMPLATE.format(problem=problem_text) response = LLM.generate(prompt, temperature=0) return response |
3. Experimental Setting and Implementation
F-1 is validated across several LLMs and mathematical problem benchmarks:
| Model | Proprietary | Open-Source |
|---|---|---|
| GPT-5 | ✓ | |
| Gemini 2.5 Pro | ✓ | |
| DeepSeek-V3.1 | ✓ | |
| Qwen3-235B | ✓ | |
| Qwen3-30B | ✓ |
Benchmarks span 2,116 problems:
- IMO-Bench: 460 competition math/proof problems
- OlympiadBench: 1,438 problems (subdivided into OE_math, OE_physics, TP_math, TP_physics)
- FinanceMath: 200 applied finance problems
- AICrypto: 18 cryptographic proof tasks
The system prompt for F-1 is “You are an AI assistant that solves problems mainly through equations.” No special tokens are required beyond LaTeX math delimiters and “\boxed{}”. Inference operates at temperature zero (greedy decoding), and all strategy-switching thresholds are empirical rather than hard-coded.
4. Empirical Results
Macro-averaged accuracy across benchmarks and models demonstrates consistent and significant improvements:
| Method | Overall Accuracy (%) |
|---|---|
| F-1 | 61.06 |
| CoT | 55.30 (Δ = +5.76) |
| PoT | 52.64 (Δ = +8.42) |
The gain is especially pronounced in applied domains:
- FinanceMath: F-1 = 56.30%, CoT = 43.00% (Δ = +13.30)
- AICrypto: F-1 = 87.54%, CoT = 80.30% (Δ = +7.24)
- OlympiadBench Physics: F-1 = 44.92%, CoT = 42.37% (Δ = +2.55)
- OlympiadBench Math: F-1 = 86.35%, CoT = 85.91% (Δ = +0.44)
Selection accuracy on differentiable problems—problems where methods yield different outcomes—was highest for applied mathematics (e.g., 73% on FinanceMath, 69.9% on OlympiadBench). F-1 attains approximately 81–84% of the maximal possible selection accuracy (the upper bound defined by always choosing whichever baseline succeeds on each instance).
5. Best Practices for F-1 Prompt Construction
Empirical findings recommend:
- Structuring prompts in two clearly delimited phases: Phase 1 (LaTeX equations) and Phase 2 (solution).
- Using minimalist directives to avoid over-generation; excessive verbosity can degrade performance.
- Reinforcing solution verification and requiring the boxed final answer to curb hallucinations.
- Priming models for novel domains by providing in-domain equation exemplars in the system prompt.
6. Limitations and Prospects for Extension
F-1's effectiveness is sensitive to model scale; equation formalization may fail on models smaller than 30B parameters lacking robust symbolic abstraction. The single-call architecture precludes backtracking or error correction if an inappropriate solving strategy is chosen, suggesting that multi-call or plan-and-solve variants could offer further gains. The F-1 methodology is presently validated only on equation-centric tasks; generalization to domains involving more loosely defined constraints, such as legal or ethical reasoning, will require development of new formalization schemas. Presently reported statistics are macro-averages over five models; quantifying variance via bootstrapping or random seed variation is indicated for future research.
7. Significance and Outlook
Formula-One Prompting operationalizes explicit equation formalization and harnesses equation structure to adaptively select among direct, CoT, or PoT approaches, all within a single LLM call. The method achieves improvements of +5.76 percentage points over CoT and +8.42 percentage points over PoT on average, with the greatest gains in applied mathematics (up to +13.30 on finance benchmark). The introduction of an intermediate equation step enforces structural alignment with expert problem-solving approaches in science, finance, and cryptography, while its low-footprint implementation facilitates practical adoption (Nitarach et al., 27 Jan 2026). A plausible implication is that equation-first prompting represents a scalable, model-agnostic means of closing the gap between LLM mathematical reasoning and domain expert performance, especially in domains governed by formal mathematical laws.