Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 97 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 38 tok/s
GPT-5 High 37 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 466 tok/s Pro
Kimi K2 243 tok/s Pro
2000 character limit reached

Relational Keypoint Constraints (ReKep)

Updated 6 August 2025
  • ReKep are functions mapping sets of 3D keypoints to a cost, encoding spatial and temporal relationships to guide robotic manipulation and geometric reasoning.
  • They utilize stateless Python functions with NumPy arithmetic, supporting modular formulation of sub-goal and path constraints for dynamic task execution.
  • The approach integrates hierarchical optimization with vision and language models to automate constraint synthesis for robust perception-action control.

Relational Keypoint Constraints (ReKep) define a structured, visually-grounded framework for encoding and optimizing constraints in environments where explicit relationships between semantic keypoints are central to system behavior—most notably in robotic manipulation, geometric reasoning, and computer vision scenarios. ReKep are formulated as functions that map a set of 3D keypoints to real-valued costs, directly enabling the specification, manipulation, and optimization of spatial and temporal relations involving objects, environmental cues, and robotic agents.

1. Formal Definition and Conceptual Role

Relational Keypoint Constraints (ReKep) are expressed as functions:

f:RK×3Rf: \mathbb{R}^{K \times 3} \rightarrow \mathbb{R}

where KK is the number of 3D semantic keypoints, each represented by a Cartesian coordinate in R3\mathbb{R}^3. The value f(k)0f(\mathbf{k}) \leq 0 signifies satisfaction of the constraint, and f(k)>0f(\mathbf{k}) > 0 quantifies violation.

In robotic manipulation, ReKep encode desired spatial or temporal relationships among keypoints associated with robot end-effectors, manipulated objects, and environmental landmarks. For instance, constraints may specify that a gripper must remain within a specified distance of a target, align with an orientation (via projected vectors or dot products), or satisfy complex multi-stage behaviors by sequencing multiple such constraints.

This unified constraint representation allows explicit, interpretable, and optimizable encoding of manipulation goals, trajectory sub-goals, kinematic configurations, and safety requirements.

2. Technical Implementation

Each ReKep is implemented as a stateless Python function utilizing NumPy-based computations over the array of keypoints. The functions operate arithmetically—using L2 norms, vector differences, dot products, and other Euclidean operations—to evaluate the cost or residual of a specific relational condition.

Examples:

  • Grasp constraint: def grasp_constraint(k): return np.linalg.norm(k[gripper] - k[handle]) - d_goal
  • Alignment constraint: def align_constraint(k): return np.dot((k[spout] - k[teapot]), direction_vec) - angle_thresh

Multi-stage or composite tasks are structured by grouping ReKep into:

  • Sub-goal constraints: Must be satisfied at the culmination of a task segment.
  • Path constraints: Must be maintained throughout execution of the segment.

This enables modular and interpretable decomposition of complex manipulation sequences, each defined as a list of Python constraint functions evaluated on the current set of semantic keypoints.

3. Optimization and Perception-Action Integration

ReKep enables hierarchical, real-time optimization for control in complex manipulation settings:

  • Hierarchical Optimization: At the higher level, optimization solves for an end-effector pose that satisfies current stage (sub-goal) constraints. At the lower level, trajectory optimization is performed to comply with path (and any additional kinematic) constraints, yielding a feasible continuous motion from the current pose to the sub-goal.
  • Solvers: Standard off-the-shelf optimizers are used—global search (e.g., Dual Annealing) is combined with local refinement (e.g., SLSQP) to ensure rapid convergence.
  • Closed-loop Operation: Perception is updated at ~20 Hz, with real-time constraint evaluation and re-optimization (∼10 Hz), achieving robust perception-action feedback for dynamic, unpredictable environments.
  • Example Workflow: Given updated RGB-D observations, keypoints are extracted, all active constraints (ReKep) are evaluated, optimization computes new control commands, and the system executes them, repeating the loop iteratively.

4. Automated Specification via Vision and Vision-LLMs

Manual authoring of ReKep for each new manipulation scenario is labor-intensive and error-prone. To overcome this, an automated pipeline leverages foundation models:

  • Semantic Keypoint Proposals: A large, self-supervised vision model (e.g., DINOv2) generates a dense set of semantically meaningful keypoints from the raw RGB-D input. Object segmentation (via, e.g., the Segment Anything Model) can isolate task-relevant objects.
  • Constraint Synthesis with VLMs: The system employs a vision-LLM (e.g., GPT-4o). Given visual context, overlayed and enumerated keypoints, and free-form natural language task instructions, the VLM produces Python code snippets implementing the desired ReKep functions for each task phase.

The resulting ReKep code is directly executable in the downstream optimization pipeline, aligning free-form task specifications with formally evaluable keypoint constraints.

5. Multi-Platform and Multi-Task Robotic Applications

ReKep has been demonstrated on diverse robotic systems, including:

  • Single-arm mobile platforms (e.g., a wheeled base with a Franka arm)
  • Dual-arm stationary platforms

The system supports:

  • Multi-stage manipulation tasks: e.g., grasp, align, pour, place, with explicit sequencing of ReKep.
  • In-the-wild and unstructured tasks: e.g., stowing items on shelves, object recycling.
  • Bimanual behaviors: e.g., folding or packing tasks requiring coordination of two arms.
  • Reactive control: If unforeseen disturbances violate a critical constraint, the system automatically backtracks to the preceding stage and replans.

The pipeline is free of any task-specific environment models or priors and operates solely on visual input and user instructions.

6. Advantages, Limitations, and Systemic Implications

Advantages:

  • Versatility: Single and multi-arm, multi-stage, and reactive behaviors across a variety of tasks.
  • Automation: No requirement for manual constraint coding due to integration of vision and language foundation models.
  • Optimizable Formalism: Constraints are compatible with generic numerical solvers and lend themselves to rapid, closed-loop optimization.
  • Expressive Semantics: The arithmetic over keypoints enables straightforward specification of distances, orientations, and even rotational relationships without recourse to more complex geometric representations.

Limitations:

  • Forward Model Simplifications: The optimization assumes a forward model in which grasped objects are rigidly attached to the manipulator, which holds only locally for short horizons.
  • Keypoint Tracking Dependence: Accurate and robust 3D tracking is critical; occlusion and noise may cause constraint violations or control errors.
  • Fixed Sequence Skeleton: The approach assumes a predefined stage sequence; fully dynamic, context-dependent restructuring of stages (i.e., adaptive sequence graphs) is not yet tractable.
  • Computational Overhead: While hierarchical optimization amortizes much of the complexity, integrating large-scale foundation models (especially VLM(s) for constraint synthesis) is resource-intensive.

7. Context within Broader Relational Constraint Theory

The ReKep design is situated within a lineage of formal treatments of relational constraints. The fundamental idea—viewing system outputs as candidate (possibly ambiguous) keypoints and representing relations among them as formal constraints—parallels the Galois connection theory for generalized functions and relational constraints (Couceiro, 2015). The crucial distinction in practice is the move from highly abstract, algebraic closure and substitution properties to direct, visually-grounded, and automatable constraint specification over observed landmarks. This enables concrete system instantiations, such as the optimization-driven manipulation pipelines described above, while remaining compatible with the broader mathematical framework of multivalued function-to-constraint satisfiability.


Relational Keypoint Constraints anchor task representations in visually observable structure, providing a highly expressive, optimizable, and automatable formalism for visuo-motor reasoning across a diverse array of manipulation tasks in robotics and related domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube