Papers
Topics
Authors
Recent
2000 character limit reached

ReJump: Structured Analysis in ML & Robotics

Updated 7 December 2025
  • ReJump is a dual-purpose framework that formalizes multi-step reasoning in LLMs and robotic control into modular, analyzable components.
  • It decouples reasoning into structured chain-of-thought trees and quantifies transitions using metrics like jump distance to assess performance.
  • In robotics, ReJump combines hierarchical control and reinforcement learning to achieve agile quadruped jumps with effective sim-to-real transfer.

ReJump refers to two distinct, state-of-the-art frameworks in machine learning research: (1) a formal representation and analysis methodology for multi-step reasoning in LLMs, and (2) control and learning architectures for continuous, agile jumping in legged robotics. The first use, introduced by Lee et al. (Zeng et al., 30 Nov 2025), decouples LLM-generated “chain-of-thought” (CoT) text into structured trees of partial solutions and transition patterns (“jumps”), enabling quantitative inspection and improvement of algorithmic reasoning. The second, emerging from robot learning (Yang et al., 2023, Guan et al., 2024), focuses on repeated, versatile real-world quadruped jumping via hierarchical control, model-based policy transfer, and frequency-domain dynamics matching. Both strands converge on abstracting complex, multi-step processes (reasoning or motion) into discrete, analyzable components, linking model behavior to performance and robustness.

1. ReJump in LLM Reasoning: Structural Representation and Formalism

ReJump formalizes the reasoning process of LLMs as a two-layered structure: a tree T=(V,E)T=(V,E) of partial solutions and a sequence W=(I,Φ)W=(I,\Phi) of node “jumps” annotated by action type. Each node vVv\in V denotes an intermediate subproblem; edges (u,v)E(u,v)\in E specify subproblem dependencies. The root v0v_0 is the initial state, and leaves S(T)VS(T)\subset V signal completed solutions or dead-ends.

A model’s reasoning trace is represented as a sequence of visited nodes I=(i0,,iK)I=(i_0,\ldots,i_K), where each movement (ik,ik+1)(i_k,i_{k+1}) is labeled as ϕk{calc,verify,backtrack}\phi_k\in\{\text{calc},\text{verify},\text{backtrack}\}. Adjacency is determined by graph structure: an “adjacent” move (ik,ik+1)E(i_k,i_{k+1})\in E encodes sequential calculation; non-adjacent transitions mark verification or backtracking. The full pair (T,W)(T,W)—the “ReJump representation”—captures both structural decomposition and search/exploration patterns within multi-step LLM outputs (Zeng et al., 30 Nov 2025).

2. Quantitative Analysis of Reasoning: Metrics and Behavioral Dissection

The ReJump framework defines a family of metrics to quantify diverse reasoning attributes:

  • Solution Count (#Sol\#\mathrm{Sol}): The number of distinct leaf nodes encountered, reflecting breadth of exploration.
  • Jump Distance (djumpd_{\mathrm{jump}}): Average graph distance between consecutive solution attempts, measuring global search versus local exploitation.
  • Success Rate (rsuccr_{\mathrm{succ}}): Proportion of attempted solutions that are correct.
  • Verification Rate (rverifyr_{\mathrm{verify}}): Fraction of transitions labeled as “verify,” indexing explicit checking behavior.
  • Overthinking Rate (roverr_{\mathrm{over}}): Fraction of redundant solution attempts after the first correct solution has been found.
  • Forgetting Rate (rforgetr_{\mathrm{forget}}): Incidence of recomputation of previously derived leaves, indicating memory or consistency failures.

For each instance, metrics decompose final accuracy into behavioral components, enabling comparisons not only by traditional performance but also by strategy, persistence, or cognitive error modes (Zeng et al., 30 Nov 2025). At scale, mean and variance of these metrics reveal characteristic reasoning “styles” of different model architectures or training regimes.

3. Extraction, Experimental Setup, and Empirical Findings

ReJump extraction is realized by an LLM-based toolchain. The Gemini 2.5 Pro model parses CoT text to populate the tree layer (JSON: nodeID \mapsto (problem, parent, result)), then extracts the sequence of jumps with labels (from, to, category). The extractor achieves ≈0.94 similarity to ground truth on the Game of 24 and >80% accuracy per trace on MATH-500, rising to >90% with triple extraction passes.

Empirical benchmarking on MATH-500 and Game of 24 reveals:

  • Task sensitivity: Proof-based math (MATH-500) emphasizes exploitation and verification, while combinatorial search (Game of 24) privileges exploration.
  • Model diversity: DeepSeek-R1 explores more broadly (higher #Sol\#\mathrm{Sol} and djumpd_{\mathrm{jump}}) but with marginally reduced rsuccr_{\mathrm{succ}} compared to Grok 3 Mini Beta, which focuses on local, high-accuracy search. Models like Claude 3.7 Sonnet and Phi-4-reasoning-plus exhibit low exploration and verification, with correspondingly lower accuracy.
  • Feature attribution: On MATH-500, success rate predicts performance; on Game of 24, both exploration (jump distance, solutions tried) matter (Zeng et al., 30 Nov 2025).

4. Training Strategy Effects: Distillation, Examples, and Reinforcement Learning

ReJump analysis reveals structure propagation and behavioral dynamics under varying LRM training protocols:

  • Distillation: SFT (supervised fine-tuning) from teacher to student increases jump and tree similarity, but sometimes reduces exploitation-task accuracy, suggesting inherited reasoning patterns do not guarantee optimal transfer.
  • Prompting and examples: More CoT exemplars do not reliably alter accuracy or tree structure, but do increase stylistic similarity (e.g., verification/backtracking frequency).
  • RL effects: RL training on Game of 24 increases exploration, while RL on MATH-500 shifts models toward higher exploitation, tuning jump distance and solution count to the benchmark’s demands (Zeng et al., 30 Nov 2025).

5. ReJump for Model Selection and Test-Time Strategy

ReJump provides actionable criteria for optimizing LLM performance:

  • Best-of-N (BoN) selection: Sampling NN traces and selecting by ReJump metrics (e.g., highest djumpd_{\mathrm{jump}} for exploration tasks) improves pass@1 by up to 9.1%. The metric can be inverted (pick lowest jump distance) for heavily exploitative tasks (e.g., Sudoku, ZebraLogic), achieving similar gains.
  • Prompt selection: By generating traces under candidate prompts and acting on ReJump metrics, prompt selection for QwQ-32B on Game of 24 boosts accuracy by ≈5% over baseline (Zeng et al., 30 Nov 2025).

6. Robotic ReJump: Architecture and Sim-to-Real Bridging

In legged robotic control, “ReJump” denotes frameworks for continuous, omni-directional quadruped jumping (Yang et al., 2023, Guan et al., 2024).

  • Hierarchical framework: The core is a stance controller blending a manually designed acceleration policy (from an offline optimal control solution) with a learned residual policy. The stance controller computes lift-off velocities for discrete jumps; the RL-learned residuals refine accelerations for stability and precision.
  • Whole-body controller: At 500 Hz, inverse-dynamics QP computes joint torques and ground contact forces under friction and actuation constraints.
  • Reward/curriculum: Training employs position, orientation, and contact rewards, with alive bonuses and penalties for divergence from desired trajectories. RL policy is trained on multi-jump episodes, embedding diverse directions and timing.
  • Sim-to-real transfer: Zero-shot transfer to Unitree Go1 hardware is achieved without domain randomization, leveraging a robust whole-body controller and residual learning. Experimental results match simulation (up to 0.5 m high, 0.6 m forward leaps, and 90° turning jumps), with minor loss from unmodeled motor saturation (Yang et al., 2023).

Alternatively, impedance-matching-based approaches (Guan et al., 2024) minimize the sim-to-real gap by matching joint-level frequency response (Bode plots) in simulation and on hardware, incorporating rotor inertia, and using carefully bounded domain randomization on dynamics parameters (PD gains, friction, mass). A two-stage RL policy interpolates between walking/running and jumping, with multi-phase curriculum to preserve walking competence during jump skill acquisition. Maximum observed jumps reach 0.38 m vertical and 0.55 m forward, with ≈85% of the hardware’s optimal (trajectory-optimized) performance.

7. Significance, Challenges, and Future Directions

ReJump, both in LLM reasoning and robotics, embodies a general philosophy: decomposing complex, sequential behaviors into modular, quantifiable operations—trees and jumps in cognitive models, or staged controllers and residual learning in robotics. This decomposition supports not only fine-grained diagnosis (exploration/exploitation, overthinking, transferability) but also targeted improvement through test-time selection, prompt engineering, and curriculum shaping.

Principal challenges include scaling the ReJump extraction process for LLMs via lightweight classifiers or semantic similarity, generalizing beyond math to code and planning, and using ReJump-derived metrics as RL rewards or stopping signals. In robotics, gaps remain in adapting to variable terrain, integrating asymmetric gaits, and consistently exceeding hardware limitations (necessitating new actuators or dynamic appendages). Both domains suggest a plausible extension: the systematic harnessing of structural behavioral signals to reinforce, debug, and guide model and agent learning across diverse cognitive and motor tasks (Zeng et al., 30 Nov 2025, Guan et al., 2024, Yang et al., 2023).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to ReJump.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube