RA³: Reasoning as Action Abstractions

Updated 28 April 2026

RA³ is a paradigm in AI that formalizes reasoning as the dynamic construction, selection, and deployment of abstract, temporally extended actions to enhance planning and sample efficiency.
It integrates symbolic, probabilistic, and neural methods by representing high-level reasoning as abstract actions within hierarchical frameworks, boosting performance across robotics, language modeling, and RL.
Empirical studies show that RA³ architectures outperform flat primitive-action learners, achieving faster convergence and higher asymptotic performance through compressed, reusable abstractions.

Reasoning as Action Abstractions (RA³) is a paradigm in artificial intelligence that formalizes intelligent reasoning as the dynamic construction, selection, and deployment of temporally extended actions—action abstractions—within decision-making systems. RA³ integrates symbolic, probabilistic, and neural methods by representing high-level "reasoning" steps as abstract actions that structure and accelerate planning, generalization, and sample efficiency across domains such as robotics, language modeling, combinatorial search, and hierarchical RL. Theoretical and empirical work demonstrates that RA³ architectures outperform flat primitive-action learners both in sample efficiency and asymptotic performance by compressing complex tasks into manageable, reusable abstractions (Zhao et al., 2023, Zhang et al., 30 Sep 2025, Lange et al., 2019, Banihashemi et al., 2024, Boussif et al., 2024).

1. Formal Framework and Theoretical Foundations

RA³ operates on hierarchically-structured Markov Decision Processes (MDPs) or their probabilistic/symbolic analogues, introducing an abstract action space $\mathcal{Z}\supseteq\mathcal{A}$ in which each $z\in\mathcal{Z}$ executes a temporally extended primitive-action sequence $a_t,\dots,a_{t+\tau}$ . Planning or policy learning can then be performed over this action abstraction space, specifying:

Pruning error: The approximation error $\Delta(\mathcal{M},\mathcal{Z}^\prime)$ incurred by restricting planning to an abstract action subspace $\mathcal{Z}^\prime$ .
RL planning error: The regret from online policy search within $\mathcal{Z}^\prime$ once pruning is completed.

Key theorems in RA³ theory demonstrate the trade-off between abstraction compactness and horizon contraction: smaller abstraction sets reduce the size of the decision space, and longer abstractions accelerate the contraction rate of value iteration by reducing the effective discount factor $\bar\gamma$ (i.e., fewer decision steps per episode), leading to sample-efficient RL (Zhang et al., 30 Sep 2025).

RA³ further provides model-theoretic definitions of sound and complete abstraction via bisimulation between high- and low-level action theories, formalizing refinement mappings $(\alpha,\beta)$ that connect high-level actions/fluents to low-level implementations in frameworks such as the situation calculus or ASP-based action languages (Banihashemi et al., 2024, Sridharan et al., 2015). Syntactic verification conditions ensure that high-level reasoning faithfully projects onto low-level realizability.

2. Canonical RA³ Algorithms and Architectures

Practical RA³ implementations instantiate these principles in diverse domains:

Mid-training RL for LLMs

The RA³ mid-training algorithm identifies reusable, temporally-consistent latent actions by maximizing a sequential variational lower bound. EM-style alternating optimization involves:

E-step: RL-based latent discovery over augmented action tokens (e.g., $\langle\texttt{think}\rangle$ vs. $\langle\texttt{act}\rangle$ ).
M-step: NTP fine-tuning on data labeled with discovered abstractions.

The use of explicit rationale markers (e.g., "reasoning" vs. "acting" tokens) in code generation tasks increases pass@1 metrics by 4–8 points over standard NTP and base models (Zhang et al., 30 Sep 2025).

Grammar Induction for Hierarchical RL

Action grammars identify macro-actions by grammar induction (k-Sequitur, G-Lexis) on policy-conditioned action sequences, formalizing abstraction as reusable non-terminals in a context-free grammar over primitive actions. Augmenting the agent’s action space with such macros empirically yields %%%%1 $\mathcal{Z}\supseteq\mathcal{A}$ 1%%%%1– $z\in\mathcal{Z}$ 2 faster convergence, especially in transfer and imitation settings (Lange et al., 2019).

Chunk-based Action Abstraction for Amortized Sampling

In GFlowNets and entropy-seeking RL, action abstractions are constructed online by byte-pair encoding–style "chunking" over high-reward trajectories. These macro-actions reduce trajectory lengths, improving credit-assignment and mode discovery rates up to $z\in\mathcal{Z}$ 3 relative to strictly atomic baselines (Boussif et al., 2024).

Coarse-to-Fine Planning in Embodied Agents

The ERRA architecture decomposes task execution as alternating cycles of coarse-resolution probabilistic reasoning (abstract plan generation by LLMs) and fine-resolution MDP execution (motor actions), with tight feedback between layers (Zhao et al., 2023). The framework generalizes to probabilistic and logic-based approaches (REBA), in which abstract planning is performed in ASP and refined via POMDPs over concretized domains (Sridharan et al., 2015).

3. Empirical Outcomes and Performance Benchmarks

RA³ approaches have exhibited superior empirical results across domains:

Code generation: On HumanEval and MBPP, RA³ mid-training achieves pass@1 improvements of 4–8 points over strong baselines. In RL post-training (RLVR), models trained with RA³ abstractions converge faster and reach higher asymptotic accuracy (Zhang et al., 30 Sep 2025).
Robotics: RA³-based architectures (ERRА, REBA) reliably solve long-horizon, noisy manipulation tasks and maintain robust performance under observation and action uncertainty (Zhao et al., 2023, Sridharan et al., 2015).
Hierarchical RL: Symbolic grammar induction yields interpretable abstractions that transfer across task scales (e.g., Towers of Hanoi, gridworld), providing up to $z\in\mathcal{Z}$ 4 sample efficiency improvement (Lange et al., 2019).
Amortized sampling: Macro-action chunking in GFlowNets accelerates mode discovery, improves density estimation, and leads to interpretable libraries that transfer across closely-related reward functions (Boussif et al., 2024).
Conversational agents: RA³-based reasoning-action synergy via RL (GRPO) delivers statistically significant gains in action recall and tool invocation precision over SFT-only or thinking-free baselines, e.g., $z\in\mathcal{Z}$ 5 vs. SFT, $z\in\mathcal{Z}$ 6 vs. vanilla Qwen3-1.7B (Rawat et al., 12 Dec 2025).

4. Methods for Constructing Action Abstractions

RA³ encompasses several constructive mechanisms for abstraction discovery:

Auto-encoding/compression: Discovering sub-sequences to compress via BPE-style algorithms (ActionPiece) or grammar induction (Sequitur, G-Lexis) (Boussif et al., 2024, Lange et al., 2019).
Latent-variable modeling: EM-style variational optimization over latent rationale tokens or "reasoning" intervals, optimizing a temporal ELBO (Zhang et al., 30 Sep 2025).
Coarse-to-fine reasoning: Hierarchically decomposing planning or execution, where each high-level action maps to a low-level program, sub-sequence, or POMDP policy (Zhao et al., 2023, Sridharan et al., 2015, Banihashemi et al., 2024).
Policy regularization: Penalizing frequent switching between reasoning and primitive actions to control abstraction granularity (e.g., explicit KL or entropy penalties) (Zhang et al., 30 Sep 2025, Rawat et al., 12 Dec 2025).

RA³ thus supports both symbolic and neural instantiations, from explicit logic-based mappings to latent neural segmentations.

5. Integration with Knowledge Representation and Logic

RA³ extends beyond pure RL by providing formal methods for action abstraction in knowledge-based settings:

Situation calculus and ConGolog: Abstract and refined agent specifications are related by refinement mappings, with soundness and completeness guaranteed via bisimulation. Precise conditions ensure projectability, executability, and state-tracking across levels (Banihashemi et al., 2024).
ASP-based planning: Abstract plans are computed via answer set programming over logical action theories, then instantiated by refinement into uncertain/sequential physical domains (e.g., via POMDPs), with bidirectional update of history and observation (Sridharan et al., 2015).
Feedback integration: Both empirical and formal RA³ architectures exhibit closed feedback loops, enabling online belief correction and iterative abstraction refinement at runtime (Zhao et al., 2023, Sridharan et al., 2015).

6. Limitations and Open Directions

Challenges in RA³ research remain:

Determining optimal abstraction granularity: The selection of abstraction frequency (e.g., horizon length $z\in\mathcal{Z}$ 7, KL penalty $z\in\mathcal{Z}$ 8) impacts both compression and coverage, often requiring per-domain tuning (Zhang et al., 30 Sep 2025).
Dependence on expert data: Some instantiations require large, high-quality corpora of expert trajectories or rationales for effective pruning and abstraction learning (Zhang et al., 30 Sep 2025).
Scalability and generalization: Transfer of abstraction libraries across highly diverse tasks, learning of task-conditional abstraction sets, and full automation of abstraction discovery (without cold-start reasoning traces) remain open (Boussif et al., 2024, Rawat et al., 12 Dec 2025).
Non-differentiable symbolic elements: In settings using grammar induction or ASP/action-theory abstraction, connecting non-differentiable symbolic steps to gradient-based policy learning still poses integration questions (Lange et al., 2019, Sridharan et al., 2015).
Interpretability: While action grammar and chunking approaches yield interpretable abstractions, neural latent-variable methods may require further techniques for compositional explanation and human debugging (Zhang et al., 30 Sep 2025).

7. Significance and Broader Implications

RA³ is distinct in aligning cognitive, algorithmic, and formal perspectives. It offers a unifying account of how abstraction—whether through latent rationales in neural policies, symbolic grammars, or logic-based refinements—serves as the substrate of scalable, data-efficient, and interpretable reasoning. This paradigm has been validated across RL, combinatorial sampling, program synthesis, language modeling, robotics, and dialog agents. The core insight is that intelligent behavior emerges not only from planning or acting, but from meta-reasoning about which abstractions to invent, select, and deploy—closing the gap between symbolic reasoning and hierarchical action (Zhao et al., 2023, Zhang et al., 30 Sep 2025, Lange et al., 2019, Banihashemi et al., 2024, Boussif et al., 2024, Rawat et al., 12 Dec 2025).