Terrain-Aware Foothold Placement Reward

Updated 17 June 2026

The paper presents a reward formulation that integrates elevation maps and geometric features to guide safe and precise foothold placement.
It leverages metrics like edge proximity, height variance, and candidate matching to penalize unstable contacts and encourage robust locomotion.
Reward integration strategies include double-critic methods and curriculum learning to improve real-world transfer and policy stability.

A terrain-aware foothold placement reward is a reinforcement learning (RL) reward formulation that explicitly leverages geometric characteristics of the environment—often via elevation maps, heightmaps, or point clouds—to encourage legged robots to select safe, stable, and robust contact points for their feet during locomotion. Such rewards are crucial for traversing challenging domains including stepping stones, beams, gaps, and other terrains where precise foot placement under significant terrain uncertainty is paramount. Recent research has developed multiple principled instantiations, integrating geometric perception with dense proprioceptive feedback and advanced RL schemes to yield policies with high contact precision, stability, and transfer performance.

1. Mathematical Formulations and Geometric Principles

Terrain-aware foothold placement rewards typically penalize or shape agent behavior with respect to terrain topology at or near potential footfall locations, using explicit metrics computed from exteroceptive data. Prominent variants include:

Edge-proximity penalties: Penalize footholds near the margin of a safe region, calculated as a function of the distance between the foot and the nearest terrain edge in a local elevation or heightmap ("feet-edge penalty" (Yu et al., 15 Dec 2025)).
Sample-based terrain contact penalties: Sample points inside the foot’s contact polygon, map them onto the terrain via a 2D/2.5D elevation map, and count or aggregate the number of "unsafe" samples (i.e., falling into gaps or edges); see BeamDojo’s continuous sampling-based reward (Wang et al., 14 Feb 2025).
Variance or roughness heuristics: Penalize contacts where local terrain under the foot is excessively non-planar or has high height variance, encouraging the robot to avoid placing feet on unstable or isolated regions ("foot terrain reward" (Shi et al., 2023)).
Point-cloud candidate matching: Extract local planar “foothold candidates” via covariance analysis of near-foot point clouds and reward minimum distance of touchdown to these candidates, providing dense shaping even on highly discontinuous terrain (Hao et al., 31 Mar 2026).
Stencil or support consistency: Evaluate elevation at multiple stencil points around the nominal contact to assess proximity to edges or gaps and penalize deviation from centric, stable placement (MARG (Dong et al., 24 Sep 2025)).

Fundamentally, these approaches encode geometric priors on what constitutes a "safe" foothold by structuring the reward to reflect precise, terrain-dependent criteria rather than relying solely on contact force, proprioception, or downstream task success.

2. Reward Term Implementations and Integration

Implementation details vary depending on the robot’s morphology, sensor suite, and the terrain representation. Representative instantiations include:

START ("feet-edge penalty"): At each step, for every foot in contact, a Boolean mask from the local ground-truth heightmap encodes whether the foot is within a fixed radius (2.5 cm, 5.0 cm) of a terrain edge. The penalty sums over radii and feet with tuned weights, directly penalizing edge-adjacent contacts (Yu et al., 15 Dec 2025).
BeamDojo (sampling-based foothold reward):
- Uniformly sample $n$ points in the foot’s contact polygon.
- For each, compute world coordinates and query the current elevation map.
- Penalize sample points that are more than ε below the sampled terrain height (i.e., foot overhanging a gap or edge).
- Aggregate the penalty per foot in contact and incorporate it (with dedicated critic) into the RL update (Wang et al., 14 Feb 2025).
CReF (foot-contact candidate reward):
- During swing, accumulate local point cloud of possible contact zones.
- Use covariance analysis to select planar, approximately horizontal, and not-recessed regions as candidates.
- At touchdown, reward is given by an exponential kernel of the minimum distance from the actual contact to any candidate. The signal is sparse but smooth, encouraging anticipatory foot placement towards supportable regions (Hao et al., 31 Mar 2026).
MARG (stencil edge/center reward):
- Around each contact, sample a stencil of points and compare their elevations with the center contact.
- Penalize if any neighboring value is substantially lower (detecting steps too close to an edge).
- Balances air-time and stumble rewards for comprehensive terrain-aware contact shaping (Dong et al., 24 Sep 2025).
Hiking in the Wild (volumetric edge penalty):
- Given a precomputed terrain edge grid, penalize the signed penetration distance—and velocity—of preassigned “foot volume points” into any sharp edge zone, penalizing high-speed collisions or scraping (Zhu et al., 12 Jan 2026).

Reward weights are tuned either by empirical grid search (e.g. $w_{\text{feetedge}}=-1.0$ in START, $w_{\mathrm{fh}}=2.0$ in CReF, see respective papers) or are experimentally adjusted for stability with other reward components (locomotion regularizers, body pose, smoothness, etc.).

3. RL Integration and Optimization Strategies

Integration of terrain-aware foothold rewards with broader RL objectives requires careful balance to prevent destabilizing learning or policy collapse:

Reward Grouping and Decoupling: BeamDojo’s double-critic approach explicitly splits the value estimation between dense locomotion rewards and sparse terrain-aware rewards, normalizes their advantages, and combines them to compute policy gradients—a strategy demonstrably superior to monolithic advantage estimation in sparse-reward environments (Wang et al., 14 Feb 2025).
Reward Aggregation: Total step reward is typically a weighted sum of velocity tracking, orientation, energy, action regularization, and terrain-aware foothold terms (see formulae in (Yu et al., 15 Dec 2025, Hao et al., 31 Mar 2026, Shi et al., 2023)).
Curriculum and Data Augmentation: Terrain-aware rewards are frequently paired with curriculum learning (e.g., START’s terrain-progressive curriculum (Yu et al., 15 Dec 2025)) and observation or map augmentation/noise to bridge sim-to-real gaps (BeamDojo (Wang et al., 14 Feb 2025)).
Evaluation at Behavioral Events: Some rewards are only computed on discrete events (e.g., foot touchdown in CReF (Hao et al., 31 Mar 2026)), while others provide dense feedback at every timestep.
Map Quality and Consistency: Effective RL integration requires that elevation maps, edge grids, or point clouds remain consistent and minimally noisy. Techniques like drift filtering, outlier removal, and map fusion are essential to maintain reward signal integrity (Dong et al., 24 Sep 2025, Zhu et al., 12 Jan 2026).

4. Empirical Impact and Ablative Analyses

The introduction of terrain-aware foothold placement rewards materially improves contact precision and success rates:

Study/Paper	Key Empirical Result/Improvement	Ref.
CReF	Ascend MAD on stairs: ↓3.0cm → 1.5cm; success improved ≈71%→98% on stair descent	(Hao et al., 31 Mar 2026)
BeamDojo	Success rates on hard beams: 92% (double critic), 0.3% (foothold only); foot error <10%	(Wang et al., 14 Feb 2025)
Hiking in the Wild	~6pp (percentage points) improvement on discrete terrains via volumetric edge penalty	(Zhu et al., 12 Jan 2026)
START	Robust, agile foot placement with zero-shot transfer in real-world sparsely footholded environments	(Yu et al., 15 Dec 2025)
MARG	Stability on risky gaps due to combined air-time, stumble, and center rewards	(Dong et al., 24 Sep 2025)
Shi et al. (2023)	Only policy with terrain-aware reward traversed 4.95/5 m on stepping stone test (vs. ≤2 m baseline)	(Shi et al., 2023)

Ablation studies consistently show that removing explicit terrain-aware terms leads to increased foot placement variance, greater rate of missteps, and a marked drop in traversal success, especially on discontinuous or highly structured terrain.

5. Architectural Designs and Perception Modalities

Successful terrain-aware foothold placement rewards closely integrate robot perception (e.g., LiDAR, depth, IMU), geometric map processing, and joint-level proprioception:

Heightmap and Elevation Map Usage: Gridded height data is a predominant substrate for reward calculation—either via direct sample queries (Wang et al., 14 Feb 2025), edge Boolean masks (Yu et al., 15 Dec 2025), or localized support region analysis (Hwang et al., 3 Apr 2026).
Point Cloud Processing: CReF and modern frameworks use direct point cloud statistics around the swinging foot, permitting fine-grained candidate extraction untied to map grid resolution (Hao et al., 31 Mar 2026).
Volumetric Foot Models: Edge-sensitive penalties are often computed based on geometric sampling inside the foot or “collision manifold,” not merely at a single contact point (Wang et al., 14 Feb 2025, Zhu et al., 12 Jan 2026).
Curriculum and Adaptive Scheduling: Terrain difficulty is frequently ramped up progressively (e.g., stepping stones → beams → gaps), and AdaSmpl switches between ground-truth and reconstructed maps depending on episode reward variability (Yu et al., 15 Dec 2025).
Perception Signal Robustness: Rewards leveraging inaccurate or stale terrain estimation are explicitly filtered (median filtering, statistical outlier removal, high-frequency updates) to preserve policy stability (Dong et al., 24 Sep 2025, Zhu et al., 12 Jan 2026).

6. Practical and Theoretical Considerations

Terrain-aware foothold placement rewards present certain challenges and nuances:

Sparse vs. Dense Feedback: While traditional force thresholds or CoP-based rewards provide dense feedback, terrain-aware penalties often occur only at critical moments (foot touchdown, contact crossing), requiring tailored optimization (e.g., critic splitting) to maintain exploration and prevent reward starvation (Wang et al., 14 Feb 2025).
Sim-to-Real Transfer: Explicit handling of perceptual noise and map drift (domain randomization, privileged information in sim) is required for robust real-world deployment (Yu et al., 15 Dec 2025, Dong et al., 24 Sep 2025, Wang et al., 14 Feb 2025).
Footprint Geometry: The specific method for mapping foot geometry to terrain (e.g., polygonal sampling, volumetric points, dense Gaussian “footmaps”) is tightly coupled to both physical morphology and terrain representation fidelity (Hwang et al., 3 Apr 2026, Wang et al., 14 Feb 2025).
Combinatorial Reward Balancing: Fine-tuning weightings relative to velocity, energy, and stability terms is essential; over-weighting terrain-aware penalties can lead to conservative or overly rigid gaits, while under-weighting diminishes placement precision (Yu et al., 15 Dec 2025, Hao et al., 31 Mar 2026, Shi et al., 2023).
Generalization and OOD Performance: Explicit terrain-aware rewards, especially those incorporating geometric features rather than implicit visual cues, consistently improve out-of-distribution performance on unseen, cluttered, or discontinuous terrains (Hwang et al., 3 Apr 2026, Wang et al., 14 Feb 2025).

7. Comparative Summary: Approaches and Impact

The evolution of terrain-aware foothold placement rewards has progressed toward increased geometric specificity, better perception–policy coupling, and greater robustness across domains. The dominant methodologies as summarized from recent literature are:

Reward Type	Terrain Input	Foothold Evaluation	Primary Metric	Example Work
Edge-proximity penalty	Heightmap	Boolean mask, distance-to-edge	Step zone centrality	(Yu et al., 15 Dec 2025)
Sample-based penalty	Elevation map	Foot polygon samples, penetration	“Bad” sample count	(Wang et al., 14 Feb 2025)
Covariance planarity	Point cloud	Planarity, normal/horizontal test	Distance to candidate	(Hao et al., 31 Mar 2026)
Stencil-based center test	Elevation map	Near-vs-center elevation drop	Edge adjacency	(Dong et al., 24 Sep 2025)
Volumetric edge risk	Mesh/sharpness grid	Distance, velocity penalties	Edge collision risk	(Zhu et al., 12 Jan 2026)

These approaches have demonstrated clear empirical improvement in foot placement accuracy, stability, and traversal success across both simulated and real-world environments, especially on sparse or risk-critical terrains.

References:

(Yu et al., 15 Dec 2025) "START: Traversing Sparse Footholds with Terrain Reconstruction" (Wang et al., 14 Feb 2025) "BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds" (Hao et al., 31 Mar 2026) "CReF: Cross-modal and Recurrent Fusion for Depth-conditioned Humanoid Locomotion" (Dong et al., 24 Sep 2025) "MARG: MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping" (Zhu et al., 12 Jan 2026) "Hiking in the Wild: A Scalable Perceptive Parkour Framework for Humanoids" (Shi et al., 2023) "Terrain-Aware Quadrupedal Locomotion via Reinforcement Learning" (Hwang et al., 3 Apr 2026) "Learning Locomotion on Complex Terrain for Quadrupedal Robots with Foot Position Maps and Stability Rewards"