Generative Recursive Reasoning

Published 19 May 2026 in cs.AI | (2605.19376v1)

Abstract: How should future neural reasoning systems implement extended computation? Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence extension by performing iterative latent-state refinement with shared transition functions. Yet existing RRMs are largely deterministic, following a single latent trajectory and converging to a single prediction. We introduce \emph{Generative Recursive reAsoning Models (GRAM)}, a framework that turns recursive latent reasoning into probabilistic multi-trajectory computation. GRAM models reasoning as a stochastic latent trajectory, enabling multiple hypotheses, alternative solution strategies, and inference-time scaling through both recursive depth and parallel trajectory sampling. This yields a latent-variable generative model supporting conditional reasoning via $p_θ(y \mid x)$ and, with fixed or absent inputs, unconditional generation via $p_θ(x)$. Trained with amortized variational inference, GRAM improves over deterministic recurrent and recursive baselines on structured reasoning and multi-solution constraint satisfaction tasks, while demonstrating an unconditional generation capability. \href{https://ahn-ml.github.io/gram-website/}{https://ahn-ml.github.io/gram-website}

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces GRAM, a novel probabilistic framework that models recursive reasoning as a stochastic process to explore multiple latent trajectories.
It employs a dual-loop architecture with deterministic proposals and learnable Gaussian perturbations to decouple reasoning depth from output length.
GRAM achieves superior performance in tasks like Sudoku-Extreme and N-Queens, demonstrating effective multi-solution exploration and structured reasoning.

Generative Recursive Reasoning: A Latent Probabilistic Paradigm for Recursive Neural Reasoning

Motivation and Context

The dominant paradigm in neural reasoning has largely revolved around autoregressive architectures, which extend outputs by sequentially generating tokens or latent embeddings. While successful for many applications, this approach intrinsically conflates computational depth with output length and often fails to efficiently address problems involving multi-solution ambiguity, structured constraints, or the need for extended internal computation. Recursive Reasoning Models (RRMs) offer an alternative by refining a latent state through iterative computation via shared transition functions, decoupling depth from parameter count and output length. However, prior RRMs such as the Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM) are fundamentally deterministic: they always follow a single refinement trajectory, collapsing multiple plausible reasoning paths into a single solution.

To address these limitations, the paper introduces Generative Recursive reAsoning Models (GRAM), a novel framework that generalizes recursive reasoning into a probabilistic latent trajectory setting. GRAM treats recursive computation as a stochastic process, enabling parallel exploration of multiple latent reasoning trajectories, maintaining alternative hypotheses, and supporting both conditional and unconditional generation.

Model Architecture and Training Formulation

GRAM models the conditional probability $p_\theta(y \mid x)$ by marginalizing over stochastic latent reasoning trajectories. For a given input $x$ , an embedding is computed and used throughout the recursion. The model operates in two nested loops:

Inner Loop: For $T$ steps, the latent state is updated with a stochastic transition. Instead of the deterministic updates of earlier RRMs, GRAM computes a deterministic proposal and then adds a learnable, state-dependent Gaussian perturbation.
Outer Loop: $N_{\text{sup}}$ supervision steps are chained recursively, with the terminal state of one step serving as the initial state for the next.

The model instantiates a hierarchical latent state $(h, l)$ , where $h$ is a high-level state updated stochastically at the outer loop, and $l$ is a low-level state refined deterministically in the inner loop. This multi-scale decomposition enables both rapid local refinement and higher-level trajectory exploration.

Figure 1: GRAM samples diverse stochastic latent reasoning trajectories, enabling recovery of multiple solutions in tasks with ambiguous or under-constrained inputs, in contrast to deterministic RRMs which collapse to a single trajectory.

During training, GRAM is formulated as a latent-variable model trained via amortized variational inference, maximizing a trajectory-level evidence lower bound (ELBO). The variational posterior conditions on input and target, while the generative prior conditions solely on the input. Training leverages truncated gradient propagation across supervision steps for scalability.

Figure 2: Both the true ELBO and surrogate training objective monotonically improve during training, validating the effectiveness of truncated deep supervision.

Inference-Time Scaling and Multi-Hypothesis Exploration

A core innovation is GRAM’s dual axis of inference-time scaling:

Depth: Extending the number of recursive transitions or supervision steps, as in prior RRMs.
Width: Sampling multiple latent trajectories in parallel, enabling the model to consider many possible solutions or hypotheses simultaneously.

Selection among candidate outputs is achieved via majority voting or a learned Latent Process Reward Model (LPRM) that predicts the final output’s accuracy from its latent state. This parallel exploration capability is unattainable with deterministic RRMs.

Figure 3: GRAM supports superior inference-time scaling in both recursion depth and sample width, with parallel sampling yielding performance improvements unattainable for deterministic predecessors.

Empirical Results: Structured and Generative Tasks

Structured Reasoning Benchmarks

GRAM demonstrates consistently superior performance over deterministic baselines (Looped TF, HRM, TRM) on challenging structured reasoning tasks such as Sudoku-Extreme and the ARC-AGI challenge. Key empirical findings include:

Sudoku-Extreme: GRAM achieves 97.0% accuracy, substantially outperforming all deterministic RRMs, for which the best competitor (TRM) attains 87.4%. Notably, large pretrained LLMs attain 0% in this benchmark, underscoring the unique advantage afforded by recursive computation and multi-trajectory exploration.
ARC-AGI: The gains persist on abstract visual reasoning tasks, where GRAM again outperforms deterministic recursive baselines.
Figure 4: GRAM yields higher accuracy on both Sudoku and ARC-AGI, showcasing the practical advantage of stochastic latent transitions in recursive reasoning.

Multi-Solution Constraint Satisfaction

GRAM excels in multi-solution settings such as N-Queens and Graph Coloring, maintaining accuracy and coverage even as the number of valid solutions increases—addressing a failure mode that is structural for deterministic RRMs:

N-Queens: Deterministic RRMs exhibit catastrophic degradation in both accuracy and coverage as solution multiplicity increases; GRAM sustains 99.7% accuracy and up to 90.3% solution coverage (20 samples, 8x8), outperforming masked diffusion and autoregressive baselines in constraint satisfaction.
Graph Coloring: GRAM achieves a near-order-of-magnitude reduction in constraint violations relative to autoregressive transformers for 10-node graphs.

(Figure 1, right) (Figure 3, right)

Figure 3: Unlike deterministic recursive models, GRAM maintains robust accuracy as the number of possible solutions increases, demonstrating effective non-collapsed exploration.

Unconditional Generation

GRAM extends its probabilistic recursive computation to unconditional generative modeling. On binarized MNIST:

Unconditional Generation: GRAM’s Inception Score improves monotonically with increased inference steps (1.85→2.04, 8→256 steps), with FID also dropping accordingly (84.0→73.3).
Structured Sample Generation: In unconditional Sudoku generation, GRAM achieves 99.05% validity, significantly outperforming discrete diffusion models with fewer parameters and orders-of-magnitude fewer generation steps.
Figure 5: GRAM refines unconditional image samples over recursive steps, correcting initial errors and improving generative quality via stochastic latent transitions.

Figure 6: GRAM can synthesize highly diverse, fully valid Sudoku boards unconditionally, without any explicit constraint checking.

Ablation Analyses and Qualitative Visualization

Ablation studies confirm that both stochasticity and learned guidance are essential. The absence of either diminishes performance or induces mode collapse, especially on multi-solution tasks. Attempts to introduce naive stochasticity in deterministic RRMs fail to replicate the benefits of GRAM, highlighting the necessity of a variational framework for effective exploration.

Visualization of latent trajectories reveals that GRAM explores and covers larger regions of the solution landscape than deterministic models, directly supporting its empirical superiority on multi-solution and ambiguous tasks.

Figure 7: TRM (deterministic) follows a single trajectory, often failing to escape local minima in the latent loss landscape.

Figure 8: GRAM (with 50 samples) explores diverse regions, with several trajectories reaching optimal low-loss areas.

Implications, Limitations, and Future Directions

GRAM’s generative recursive reasoning paradigm presents significant implications for both practical neural reasoning and theoretical modeling:

Structured Reasoning: By facilitating robust exploration over solution spaces and supporting parallel hypothesis evaluation, GRAM is well suited for domains requiring constraint satisfaction, combinatorial search, or reasoning under ambiguity.
Efficient Inference-Time Scaling: The decoupling of reasoning “width” (number of latent samples) from parameter count and “depth” (recursion length) offers a new handle for tuning compute vs. performance without retraining or increasing model size.
Generative Modeling: The ability to view recursive computation as a generative process enables transfer beyond reasoning to unconditional generation, with promising results in both symbolic and perceptual domains.

However, GRAM’s reliance on sequential deep supervision introduces training inefficiencies compared to fully parallel Transformers, posing challenges for scaling to higher-capacity or longer-horizon problems. Additionally, the risks of generating plausible but invalid structures, and the computational burden of multi-sample inference, warrant further practical study. Future research is likely to investigate memory-efficient training schemes, adaptive sampling methods, scaling laws under varying recursion and width, integration with external search routines, and applications in domains such as scientific discovery, program synthesis, and model-based planning.

Conclusion

Generative Recursive reAsoning Models (GRAM) establish probabilistic multi-trajectory recursion as a powerful design principle for neural reasoning architectures. By introducing stochastic latent transitions into recursive computation and leveraging variational training, GRAM attains superior structured reasoning performance, recovers diverse solutions in multi-solution settings, and enables unconditional generation—all while supporting new axes of computational scaling at inference. The work lays the foundation for future architectures that treat reasoning as a process over distributions of latent trajectories, providing both practical tools for structured tasks and theoretical advances in neural reasoning systems.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper introduces a new way for AI models to “think” through problems. Instead of following one fixed line of thought, the model (called GRAM: Generative Recursive reAsoning Model) explores many possible lines of thought in parallel and over multiple steps. This helps it handle problems with uncertainty or many correct answers (like puzzles with multiple solutions), and even create new things from scratch (like generating images or Sudoku boards).

What questions are the authors asking?

How can we build AI systems that don’t just get bigger, but actually think better by using their computation more wisely?
Can a model keep several “hypotheses” in mind at once, not just one?
Can we scale up reasoning both by “thinking longer” (more steps) and by “thinking wider” (trying multiple ideas in parallel)?
Can the same setup both solve problems when given an input (like completing a Sudoku) and generate things with no input (like inventing a new Sudoku board)?

How does their method work?

Think of the model as having an internal “scratchpad” where it refines its ideas step by step (this scratchpad is called a latent state). Traditional recursive models update this scratchpad in a fixed, predictable way, so they usually end up with one final answer. GRAM changes this by adding controlled randomness, so the model can explore multiple possible paths before choosing an answer.

Here are the main parts, in everyday terms:

The latent state is like the model’s hidden workspace. GRAM splits it into:
- A high-level memory h (big-picture plan).
- A low-level memory l (fine details).
Inside each big step, it quickly refines the low-level details several times while holding the big-picture plan steady.
Then it updates the big-picture plan once, using both the refined details and a bit of guided randomness.

Analogy: Imagine solving a puzzle. Within each round, you tweak small details several times (move pieces around), then you make one larger strategic adjustment (change your overall approach).

Adding guided randomness to explore better

At each big-picture update, GRAM first makes a best-guess update (a deterministic proposal).
Then it adds a small, learned “random nudge” (noise) whose direction and size depend on the current situation.
This lets the model explore different possible reasoning paths, instead of getting stuck in one.

Analogy: You’re navigating a maze. A GPS suggests the next turn (the proposal), but you also add a small, smart wiggle to explore nearby routes (the guided randomness), so you don’t miss a better path.

Training with a helper that can peek at the answer

During training, the model uses two roles:
- A “prior” that guesses without seeing the answer (how it will work at test time).
- A “posterior” that’s allowed to peek at the correct answer while learning. It learns better ways to use the random nudges.
The training goal balances two things:
- Solving the task correctly.
- Keeping the “peeking helper” close to what the “no-peeking” model can do on its own, so it won’t collapse when the helper is gone.

Analogy: A student practices with a tutor who can check the textbook (peek), then learns to do the same without help.

Scaling reasoning at test time: deeper and wider

Depth (think longer): Let the model run more refinement steps to improve the result.
Width (think wider): Run multiple “thought trajectories” in parallel and pick the best result.
Choosing the best: Either use majority vote or a small judge network that scores which hidden state is most promising.

Analogy: You can spend more time thinking, or ask a group of friends to brainstorm in parallel and then pick the best idea.

Not just solving—also generating

The same process can work with no input. GRAM can generate data from scratch (like digit images or Sudoku boards) by sampling and refining its own latent thoughts until an output appears.

What did they find?

On several benchmarks, GRAM did better than similar models that only follow one deterministic path:

Hard puzzles (Sudoku-Extreme, ARC-AGI): GRAM solved more problems correctly than deterministic recursive baselines (like Looped Transformers, HRM, TRM).
Multi-solution tasks (N-Queens, Graph Coloring): GRAM found more valid different solutions and kept accuracy high even when many answers were possible. Deterministic models tended to “collapse” to one type of answer and miss others.
Test-time scaling: Sampling multiple trajectories in parallel (wider) improved results as much as, or more than, just running longer (deeper), often with similar compute cost.
Unconditional generation:
- Sudoku: GRAM generated valid full Sudoku boards at very high rates, using far fewer steps and parameters than a diffusion-style baseline.
- MNIST digits: GRAM generated clear digits with quality similar to a strong baseline, while deterministic recursion collapsed to poor-quality results.

Why this matters:

Many real problems have uncertainty or multiple correct answers. Being able to explore and keep multiple hypotheses truly helps.
Parallel exploration can improve quality without making you wait longer for one very deep chain of thought to finish.
One architecture that both reasons (given inputs) and generates (from scratch) is flexible and powerful.

What is the impact of this research?

Better problem-solving: GRAM shows that organizing computation as “guided, stochastic, recursive thinking” can beat just making models bigger or producing long text chains.
Practical scaling: You can trade off between depth (latency) and width (parallelism) depending on your resources and needs.
Generality: The same idea works for structured puzzles and for generating images or boards—suggesting broad usefulness.
Design principle: Future reasoning models could adopt “probabilistic multi-trajectory recursion” as a core design, not an afterthought.

Caveats and care:

Training is still sequential over many steps, which can be slower or harder to scale than standard Transformers.
Because the model can generate very plausible outputs, users must still verify answers in high-stakes settings.
Using many parallel samples can increase compute costs, so efficiency and good selection methods (like the judge network) are important.

In short

Purpose: Let models explore several lines of thought, not just one.
Method: Recursively refine a hidden workspace with small, learned random nudges, trained with a helper that can peek at answers.
Results: Stronger puzzle solving, better handling of multiple solutions, effective parallel scaling, and solid generation from scratch.
Impact: A practical step toward AI that thinks by exploring, not just by growing bigger.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a consolidated list of concrete gaps, uncertainties, and unexplored questions that emerge from the paper and could guide future work:

Training objective bias and variance
- The paper relies on a truncated, supervision-step surrogate objective (stopping gradients before the final transition of each step), which is a biased approximation to the full trajectory ELBO; the impact of this bias on consistency, convergence, and sample efficiency is not quantified.
- No comparison is provided against unbiased or lower-bias alternatives (e.g., full-trajectory reparameterization, DReG, VIMCO, control variates, or variational smoothing), nor is the variance of gradient estimators analyzed.
- The effect of KL scaling/annealing (e.g., β-ELBO or step-wise KL weights) on exploration, mode coverage, and over-regularization is not studied.
Stochastic guidance design space
- Stochasticity is injected only into the high-level latent with a diagonal Gaussian; richer priors/posteriors (full-covariance, low-rank, learned flows, mixture or heavy-tailed distributions) and their effect on exploration and stability are not explored.
- The guidance mean/variance schedules and their adaptation over recursion depth are not studied; there is no analysis of annealing strategies or depth-dependent noise.
- An attempt to add noise to the low-level latent reportedly did not help, but alternative structured noise mechanisms (e.g., selective channel/attention-head noise, learned masks, correlated noise across steps, or state-dependent diffusion-like schedules) remain unexplored.
- It is unclear whether the model suffers posterior collapse or prior over-dispersion under different hyperparameters; diagnostics for effective stochasticity (e.g., per-step KL, mutual information between z and y) are not reported.
Decoder and latent interface
- The decoder reads only the high-level latent; whether decoding from both levels, multi-read decoding, or iterative/auxiliary decoders improves accuracy, calibration, or diversity is not investigated.
- The trade-off between hierarchical depth K (low-level updates per transition) and exploration quality/efficiency is not characterized.
Inference-time width scaling and selection
- Parallel sampling gains are shown, but the latency–throughput–quality trade-offs under realistic hardware constraints (e.g., GPU memory and batching limits) are not quantified.
- The Latent Process Reward Model (LPRM) is trained with a sparse, binary correctness signal; its calibration, robustness to distribution shift, and cross-task generalization are unstudied.
- There is no comparison with alternative candidate selection mechanisms (e.g., learned rerankers using input–output features, self-consistency with constraint checks, or test-time energy/score functions).
- Interacting or population-based trajectory search (e.g., resampling, beam search in latent space, particle MCMC, or cross-trajectory communication) is not explored; trajectories are sampled independently.
Adaptive depth (ACT) behavior
- Although ACT is adopted, there is no analysis of learned halting distributions, depth–accuracy–latency trade-offs, or stability; optimal halting policies and their generalization remain open.
- Interactions between ACT and width scaling (e.g., variable halting depths across parallel samples) are not characterized.
Comparisons and scope of evaluation
- The evaluation focuses on controlled puzzles and binarized MNIST; performance and practicality on large-scale or real-world tasks (natural language reasoning, code synthesis, math word problems, scientific planning, or multimodal reasoning) are left open.
- There is no systematic comparison with strong test-time sampling and reranking strategies in autoregressive LLMs (e.g., self-consistency, multi-sample CoT, ToT/GoT) under matched compute.
- The unconditional generative evaluation lacks likelihood or bound estimates (e.g., bits/dim, NLL via importance sampling); only IS/FID and Sudoku validity are reported.
Compute and scalability constraints
- The authors note sequential deep supervision as a bottleneck but do not quantify wall-clock, memory, or energy compared to strong Transformer baselines at matched quality.
- Methods to mitigate sequential training (e.g., synthetic gradients, equilibrium models, reversible updates, parallel-in-time solvers, learned warm-starts) are not explored.
- Scaling laws (quality vs parameters/steps/samples) and diminishing returns for width vs depth are not studied.
Theoretical properties and guarantees
- There are no theoretical guarantees on convergence, exploration sufficiency, or approximation quality of the surrogate objective relative to the true ELBO.
- The relationship to planning/search (e.g., stochastic local search, MCTS in latent space) and formal links to inference in probabilistic graphical models or SSMs are not analyzed.
Constraint satisfaction and diversity trade-offs
- The balance between constraint satisfaction and solution diversity is not formally studied; how to control or tune this trade-off (via KL schedules, noise magnitude, or selection policies) remains unclear.
- Failure modes for multi-solution tasks (e.g., mode redundancy, duplicate solutions among samples) lack quantitative analysis beyond coverage percentages.
Robustness, calibration, and reliability
- Uncertainty calibration (e.g., proper scoring rules, ECE) for both predictive distributions and LPRM scores is not evaluated.
- Robustness to distribution shift, adversarial perturbations, or corrupted inputs is unstudied.
- The reliability of GRAM when constraints cannot be fully satisfied (partial credit, soft constraints) is not examined.
Architectural alternatives and ablations
- While hierarchical recursion is used, alternatives (e.g., continuous-time/ODE/SDE recursion, diffusion-style schedules, energy-based or score-based latent dynamics) are not compared.
- The role of auxiliary losses or intermediate decoders at multiple steps to stabilize training and improve exploration is not investigated.
- Sensitivity to key hyperparameters (number of transitions T, supervision steps N_sup, K, noise scale initialization, KL weights) is not systematically profiled.
Integration with external tools and priors
- Combining GRAM with programmatic constraint checkers, differentiable solvers, external memory, or tool-augmented reasoning is not explored.
- Incorporating domain-specific priors (e.g., symmetries in Sudoku/graph rules, CSP factorization) into the transition or guidance distributions remains open.
Generalization and transfer
- Cross-task transfer and few-shot generalization of the learned stochastic transitions are not evaluated (e.g., zero/few-shot adaptation to new puzzle variants or larger board sizes).
- The ability to reuse or compose learned latent skills/strategies across tasks is not studied.
Safety and evaluation fidelity
- Although broader risks are acknowledged, there is no empirical assessment of hallucination/invalid-solution rates under constraint uncertainty or noisy reward models.
- For unconditional generation, the fidelity of evaluation on binarized MNIST (e.g., effect of binarization, FID computation protocol) and its relevance to broader generative modeling remains limited.
Reproducibility and fairness
- Some baseline results are omitted due to cost (e.g., Looped TF on ARC-AGI), leaving potential questions about fairness and reproducibility under matched compute and tuning budgets.
- Details on hyperparameter searches for baselines vs GRAM, and sensitivity to random seeds, are not fully reported in the main text.

View Paper Prompt View All Prompts

Practical Applications

Overview

Based on the paper “Generative Recursive Reasoning (GRAM),” the following applications translate its core findings—stochastic latent transitions in recursive reasoning, width- and depth-based inference-time scaling, and variational training—into practical, real-world use cases. Each item specifies sectors, potential tools/workflows, and feasibility assumptions or dependencies.

Immediate Applications

These can be prototyped or deployed now with task-specific data and moderate engineering effort, leveraging GRAM’s demonstrated strengths on constraint satisfaction and structured reasoning.

Constraint-satisfaction plug-ins for operations and logistics (industry; software/OR)
- What: Use GRAM as a heuristic or solver for scheduling, timetabling, rostering, and resource allocation (e.g., staff rosters, machine/shift scheduling) where multiple valid solutions exist.
- How: Integrate a “GRAM-based CSP module” into existing OR stacks (e.g., OR-Tools, CP-SAT) to propose candidate assignments via width sampling and select with a Latent Process Reward Model (LPRM) or constraint validator.
- Dependencies: Formalization of problem-specific encoders/decoders; a fast constraint checker for selection; domain training data or synthetic generators; GPU/CPU capacity for parallel sampling.
Candidate generation and ranking workflows (industry; software/UX/productivity)
- What: Generate multiple plausible variants and select the best via LPRM—e.g., layout planning, menu rotations, meeting/school timetables, seating charts.
- How: “Best-of-N” parallel trajectory sampling with majority vote or LPRM scoring, integrated into internal tools or user-facing interfaces.
- Dependencies: Simple quality signals or verifiers; moderate compute to run N parallel trajectories; calibration of LPRM targets.
Hybrid solver–neural pipelines (industry/academia; software/AI engineering)
- What: Couple GRAM with search or SAT/SMT solvers—GRAM proposes candidate solutions or variable orderings; a classical solver verifies/refines.
- How: A “neuro-symbolic CSP pipeline” where GRAM’s stochastic guidance explores diverse solution paths; verification ensures correctness.
- Dependencies: Well-specified interfaces between GRAM outputs and solver inputs; interpretable constraint schemas; verifiers to mitigate false positives.
Educational puzzle and content generation (education; edtech/content)
- What: Generate valid, diverse puzzles (e.g., Sudoku), grading rubrics, or multiple valid solution paths for assignments and exams.
- How: Unconditional GRAM generation for puzzles; conditional GRAM for problem-specific solutions; present top-k variants via LPRM or rule-based filters.
- Dependencies: Content safety checks; quality thresholds; lightweight compute for educators/platforms.
Small-footprint, high-compute test-time reasoning (industry; on-device/edge)
- What: Deploy compact GRAM models that scale via shared weights and test-time recursion/parallel sampling for local constraint tasks (e.g., embedded scheduling, rule consistency checks).
- How: “On-device recursive reasoner” with adaptive halting (ACT) and width sampling tuned to device budget/latency.
- Dependencies: Efficient runtime kernels for repeated shared blocks; ACT tuning for latency; domain encoders/decoders.
LLM scaffolding: latent “reasoner” module (industry/academia; software/AI tooling)
- What: Call GRAM as a subroutine from an LLM agent to explore multiple latent hypotheses for structured subproblems, returning the best candidate with a verifier.
- How: A “GRAM tool” in agent frameworks (e.g., LangChain) for CSP-like subtasks (e.g., resource allocation in multi-tool plans).
- Dependencies: API plumbing; prompt-to-structure conversion; verifiers or scoring heads; monitoring for failure modes.
Risk/scenario generators under constraints (finance/operations; analytics)
- What: Sample diverse, valid scenarios under hard constraints (e.g., budget caps, exposure limits) for planning or stress-testing.
- How: Parallel hypothesized portfolios/plans, filtered by rule-checkers and ranked by LPRM or domain metrics.
- Dependencies: Accurate domain encoders; compliance constraints modeled explicitly; post-hoc validation; compute budget for width.
Quality control and test case generation (software/MLOps)
- What: Propose multiple test inputs satisfying structural constraints (e.g., boundary cases) to improve coverage.
- How: Unconditional or conditional GRAM to produce structured test artifacts; select/curate via executable checks.
- Dependencies: Formalization of invariants; integration into CI/CD; small compute overhead.
Hospital/clinic micro-scheduling pilots (healthcare; operations)
- What: Assist with appointment blocks, operating room schedules, or nurse shift rosters where multiple valid schedules exist.
- How: GRAM-generated candidate rosters via width sampling; verification against regulatory/union constraints; human-in-the-loop selection.
- Dependencies: Strong constraint modeling; auditable logs; local validation; privacy/security requirements.
Inference-time efficiency tuning (cross-sector; AI/infra)
- What: Trade latency for quality by balancing recursion depth and width; use early halting with ACT and LPRM scoring to meet SLAs.
- How: “Depth/width controller” that tunes number of steps and samples per request based on observed difficulty or value thresholds.
- Dependencies: Telemetry and metrics; policy for compute budgets; fallback strategies.

Long-Term Applications

These require further research, scaling, validation, or sector-specific certification beyond the paper’s experiments.

Clinical decision support with multi-hypothesis reasoning (healthcare)
- What: Maintain differential diagnoses or treatment plans as parallel latent trajectories; present vetted options with uncertainty estimates.
- How: GRAM integrated with EHR and guideline knowledge; width sampling to explore alternatives; verifiers for contraindications.
- Dependencies: Regulatory approval; robust clinical datasets; interpretable outputs; rigorous calibration and safety checks.
Autonomy and robotics: multi-hypothesis task and motion planning (robotics)
- What: Explore parallel high-level plans under uncertainty (e.g., task sequencing, resource constraints) and feed into motion planners.
- How: GRAM at the symbolic planning layer; LPRM guided by task success proxies; integration with MCTS or trajectory optimizers.
- Dependencies: Tight coupling with low-level planners; real-time constraints; safe exploration policies.
Supply-chain and grid-level optimization (energy/manufacturing/logistics)
- What: Generate diverse, constraint-compliant plans for production, inventory, or power scheduling with parallel hypothesis evaluation.
- How: GRAM proposes plans; simulators/optimizers verify and refine; width scaling accelerates exploration of the plan space.
- Dependencies: High-fidelity simulators; multi-objective metrics; robust verifiers; significant compute and data integration.
Generative design and CAD under constraints (manufacturing/construction)
- What: Explore multiple design variants satisfying structural, budgetary, or manufacturability constraints; present ranked portfolios.
- How: GRAM-driven “design space explorer” pipeline, connected to CAD validators and physics/FEA checks.
- Dependencies: Accurate geometry/constraint encoding; slow verifier bottlenecks; human-in-the-loop reviews.
Proof search and theorem exploration (academia; formal methods)
- What: Multi-trajectory latent exploration of proof strategies, merged with SAT/SMT/ATP systems.
- How: GRAM proposes proof sketches; theorem provers verify; LPRM predicts promise of partial proof states.
- Dependencies: Benchmarks and datasets; symbolic interfaces; interpretability of latent states.
Portfolio construction and compliance-aware optimization (finance)
- What: Parallel exploration of feasible portfolios or hedging strategies under regulatory and risk constraints.
- How: GRAM plus rule-verification and scenario analysis; width scaling to surface high-quality alternatives.
- Dependencies: Regulatory compliance; accurate risk models; auditability; robust out-of-sample performance.
Privacy-preserving, constraint-aware synthetic data (public policy/health/finance)
- What: Generate synthetic tabular records satisfying domain constraints and privacy budgets (e.g., DP).
- How: Unconditional GRAM with constraint and privacy-aware post-processing; validators for rule adherence.
- Dependencies: Strong privacy frameworks; coverage of minority modes; evaluation of utility vs. privacy.
AI tutors with multiple solution strategies (education)
- What: Present alternative valid solution paths to math/logic problems; diagnose misconceptions by exploring likely student trajectories.
- How: GRAM-generated multi-path solutions ranked by pedagogical value; alignment with curriculum standards.
- Dependencies: Curriculum-aligned datasets; explainability; content moderation; fairness across student populations.
Probabilistic programming and inference engines (software/AI research)
- What: Use GRAM’s recursive latent transitions as a generic inference/computation module within PPLs to navigate combinatorial state spaces.
- How: PPL backends call GRAM to refine latent proposals; LPRM scores partial states.
- Dependencies: APIs for program–model interfaces; stability and convergence analyses.
Foundation reasoning models with depth–width scaling (AI at scale)
- What: Embed GRAM-like stochastic recursion inside larger models to replace verbose chain-of-thought with compact latent computation and parallel hypothesis exploration.
- How: Hybrid architectures that interleave token-level steps with latent recursive refinements; width-based sampling for robustness.
- Dependencies: Training efficiency (sequential supervision is a bottleneck per the paper); scalable variational objectives; alignment with safety policies.

Key Assumptions and Dependencies (Cross-Cutting)

Problem formalization: Success hinges on high-quality encoders/decoders that map domain constraints and outputs to/from GRAM’s latent space.
Verification and safety: For high-stakes domains, external verifiers (rule checkers, simulators, orcertified solvers) must filter candidates; LPRM alone is not a guarantee of correctness.
Compute budgets: Width-based scaling trades parallel compute for quality; deployments require careful cost–latency management and ACT tuning.
Data and supervision: Task-specific training data or reliable synthetic generators are needed; transfer to open-ended tasks remains unproven.
Training efficiency: Current training uses deep supervision with truncated gradients; scaling to very large models will require algorithmic/engineering advances.
Calibration and interpretability: Multi-hypothesis outputs must be accompanied by uncertainty estimates and human-understandable rationales for adoption in regulated settings.
Generalization: Results are on controlled benchmarks (Sudoku, ARC-AGI, N-Queens, Graph Coloring, binarized MNIST); real-world performance requires validation on domain datasets.

These applications capitalize on GRAM’s two main innovations—stochastic latent trajectories and width-based inference-time scaling—enabling exploration of multiple hypotheses with compact models and improved constraint satisfaction.

View Paper Prompt View All Prompts

Glossary

Adaptive computation time (ACT): A mechanism that allows a recurrent model to decide dynamically how many computational steps to perform before halting. "in adopting adaptive computation time (ACT)"
Amortized variational inference: A training paradigm that uses learned inference networks to approximate posteriors across many inputs, enabling efficient optimization of latent-variable models. "Trained with amortized variational inference, GRAM improves over deterministic recurrent and recursive baselines"
Autoregressive: Refers to models that generate outputs sequentially, conditioning each step on previous ones. "Large autoregressive models typically scale reasoning by extending a sequence-generation process"
Binarized MNIST: A version of the MNIST dataset where pixel values are thresholded to binary (0/1), often used for discrete generative modeling. "image generation on binarized MNIST~\citep{mnist}, where pixel values are thresholded to $0$ or $1$"
Chain-of-Thought (CoT): An approach that has models produce explicit intermediate reasoning steps in natural language. "Latent reasoning aims to reduce the inefficiency and verbosity of explicit Chain-of-Thought (CoT)"
Conditional distribution: A probability distribution of outputs given inputs, often denoted p(y|x) in supervised modeling. "GRAM models the conditional distribution $p_\theta(y \mid x)$ by marginalizing over stochastic latent reasoning trajectories."
Conditional Markov processes: Stochastic processes where the next state depends only on the current state (and possibly inputs), used to model latent trajectories. "Both the prior and the posterior are modeled as conditional Markov processes over latent states:"
Constraint propagation: Iterative enforcement and transmission of constraints throughout a problem’s variables to reduce search space. "requiring extensive constraint propagation"
Constraint satisfaction: Tasks where the goal is to assign values to variables subject to constraints, potentially with multiple valid solutions. "multi-solution constraint satisfaction tasks"
Deep supervision: Applying training signals at multiple intermediate stages or steps of a model to stabilize and improve learning. "we train GRAM with deep supervision over $N_{\mathrm{sup}$ consecutive supervision steps"
Deterministic proposal: A non-random, predicted update to the latent state before adding stochasticity. "the high-level update $f_{\mathrm{H}$ produces a deterministic proposal $u_t$ "
Discrete diffusion model: A generative modeling framework that diffuses over discrete states and then denoises to sample data. "D3PM~\citep{d3pm}, a discrete diffusion model"
Evidence lower bound (ELBO): A variational objective that lower-bounds the log-likelihood, optimized to train latent-variable models. "optimized by maximizing an evidence lower bound (ELBO)"
Fréchet Inception Distance (FID): A metric for assessing the quality of generated images by comparing feature distributions between real and generated samples. "We report IS ( $\uparrow$ ) and FID ( $\downarrow$ )."
Generative Recursive reAsoning Models (GRAM): The proposed framework that treats recursive reasoning as a probabilistic, multi-trajectory latent process for both conditional and unconditional generation. "we propose Generative Recursive reAsoning Models (GRAM), a framework that turns recursive latent reasoning into probabilistic multi-trajectory computation."
Hierarchical instantiation: A design where the latent state is split into multiple levels (e.g., high- and low-level) updated at different time scales. "Hierarchical Instantiation. We instantiate the latent state with two interacting components, $z = (h, l)$ ."
Hierarchical latent dynamics: Multi-level latent state updates that evolve at different granularities to support complex reasoning or representation learning. "reasoning-oriented recurrent designs such as hierarchical latent dynamics."
HRM: A specific prior recursive reasoning model (Hierarchical Reasoning Model) used as a deterministic baseline in the paper. "Recent recursive reasoning models such as HRM~\citep{hrm} and TRM~\citep{trm} provide early evidence for the potential of this approach in structured reasoning."
Inception Score (IS): A metric for evaluating generative models by measuring both sample quality and diversity via a pretrained classifier. "We report IS ( $\uparrow$ ) and FID ( $\downarrow$ )."
Latent Process Reward Model (LPRM): A value head trained to predict the eventual quality of a trajectory from its latent state to guide selection among sampled trajectories. "best-of-N with a Latent Process Reward Model (LPRM)"
Latent-variable generative model: A probabilistic model where observed data are generated by integrating over unobserved (latent) variables or trajectories. "This yields a latent-variable generative model supporting conditional reasoning via $p_\theta(y \mid x)$ "
Latent-variable probabilistic model: A modeling approach that posits hidden variables to explain observed data, optimized via variational objectives. "We model GRAM as a latent-variable probabilistic model $p_{\theta}$ "
Learned prior: A parameterized prior distribution over latent variables conditioned on inputs, used for sampling at inference time. "trajectories are instead generated from the learned prior $p_{\theta}(\cdot \mid x)$ ."
Marginalizing (over latent trajectories): Integrating over latent sequences to obtain the conditional likelihood of outputs. "by marginalizing over stochastic latent reasoning trajectories."
Mode collapse: Failure mode of generative models where outputs lack diversity, collapsing to a few modes. "the deterministic baseline TRM exhibits mode collapse (FID 303.29)"
Parallel trajectory sampling: Drawing multiple stochastic latent reasoning paths simultaneously at inference to explore diverse solutions. "inference-time scaling through both recursive depth and parallel trajectory sampling."
Probabilistic multi-trajectory recursion: A design principle where recursive computation is stochastic, maintaining multiple plausible latent paths. "establish probabilistic multi-trajectory recursion as a design principle"
Reparameterized Gaussian noise: Sampling noise via a differentiable transformation (reparameterization trick) to enable gradient-based training. "adding reparameterized Gaussian noise $\epsilon_t$ after a deterministic update $u_t$ "
Recursive Reasoning Models (RRMs): Architectures that refine a persistent latent state through repeated computation rather than extending an output sequence. "Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence extension by performing iterative latent-state refinement with shared transition functions."
State-dependent Gaussian: A Gaussian noise distribution whose mean and/or variance depends on the current state or proposal, enabling learned stochasticity. "samples a conditional perturbation from a state-dependent Gaussian"
Stochastic guidance: Learnable noise injected into latent updates to encourage exploration and multi-hypothesis reasoning. "We refer to $\epsilon_t$ as the learnable stochastic guidance."
Stochastic latent transitions: Randomized updates of latent states that induce a distribution over reasoning trajectories. "stochastic latent transitions yield substantial gains within the recursive-reasoning paradigm."
Surrogate objective: An approximate training objective used in place of the exact but expensive or intractable objective to reduce memory or computation. "This gives the following surrogate objective for each supervision step:"
Truncated gradient propagation: Limiting backpropagation through long computational chains by cutting gradients beyond certain steps to save memory. "we apply truncated gradient propagation"
Universal Transformers: Transformers with weight-sharing across depth that apply the same block recurrently to increase computational depth. "recurrent Transformer architectures such as Universal Transformers and Looped Transformers"
Unconditional generation: Sampling outputs without conditioning inputs by modeling p(x), as opposed to conditional p(y|x). "unconditional generation via $p_\theta(x)$ "
Variational posterior: An approximate, learnable distribution over latent variables conditioned on observed data used to optimize the ELBO. "we therefore introduce a variational posterior $q_{\phi}(\tau \mid x,y)$ "
Width-based inference-time scaling: Improving performance by increasing the number of parallel sampled trajectories at inference, rather than only increasing depth. "we introduce width-based inference-time scaling"
Looped Transformers: A recurrent transformer variant with shared blocks applied iteratively to deepen computation without increasing parameters. "Universal Transformers~\citep{universal} and Looped Transformers~\citep{looped_transformer}"
Stochastic residual perturbation: A noise addition around a deterministic update that forms the stochastic transition in latent space. "a learned stochastic residual perturbation around a deterministic update"
Latent reasoning trajectory: The sequence of latent states the model traverses during its internal reasoning process. "GRAM models reasoning as a stochastic latent trajectory"

View Paper Prompt View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Collections

Tweets

HackerNews

Gram: Recursive reasoning models with stochastic latent trajectories (10M param) (2 points, 1 comment)

Generative Recursive Reasoning

Summary

Generative Recursive Reasoning: A Latent Probabilistic Paradigm for Recursive Neural Reasoning

Motivation and Context

Model Architecture and Training Formulation

Inference-Time Scaling and Multi-Hypothesis Exploration

Empirical Results: Structured and Generative Tasks

Structured Reasoning Benchmarks

Multi-Solution Constraint Satisfaction

Unconditional Generation

Ablation Analyses and Qualitative Visualization

Implications, Limitations, and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions are the authors asking?

How does their method work?

Two loops: quick refinements inside slower big steps

Adding guided randomness to explore better

Training with a helper that can peek at the answer

Scaling reasoning at test time: deeper and wider

Not just solving—also generating

What did they find?

What is the impact of this research?

In short

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Overview

Immediate Applications

Long-Term Applications

Key Assumptions and Dependencies (Cross-Cutting)

Glossary

Open Problems

Continue Learning

Collections

Tweets

HackerNews

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research