DIPP: Discriminative Impact Point Predictor for Catching Diverse In-Flight Objects

Published 18 Sep 2025 in cs.RO | (2509.15254v1)

Abstract: In this study, we address the problem of in-flight object catching using a quadruped robot with a basket. Our objective is to accurately predict the impact point, defined as the object's landing position. This task poses two key challenges: the absence of public datasets capturing diverse objects under unsteady aerodynamics, which are essential for training reliable predictors; and the difficulty of accurate early-stage impact point prediction when trajectories appear similar across objects. To overcome these issues, we construct a real-world dataset of 8,000 trajectories from 20 objects, providing a foundation for advancing in-flight object catching under complex aerodynamics. We then propose the Discriminative Impact Point Predictor (DIPP), consisting of two modules: (i) a Discriminative Feature Embedding (DFE) that separates trajectories by dynamics to enable early-stage discrimination and generalization, and (ii) an Impact Point Predictor (IPP) that estimates the impact point from these features. Two IPP variants are implemented: an Neural Acceleration Estimator (NAE)-based method that predicts trajectories and derives the impact point, and a Direct Point Estimator (DPE)-based method that directly outputs it. Experimental results show that our dataset is more diverse and complex than existing dataset, and that our method outperforms baselines on both 15 seen and 5 unseen objects. Furthermore, we show that improved early-stage prediction enhances catching success in simulation and demonstrate the effectiveness of our approach through real-world experiments. The demonstration is available at https://sites.google.com/view/robot-catching-2025.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents a novel DIPP framework that improves early-stage impact point prediction using LSTM-based discriminative feature embedding.
It constructs a real-world dataset of 8,000 trajectories from 20 diverse objects, demonstrating enhanced performance on previously unseen objects.
The approach, with both NAE and DPE variants, achieves higher catching success rates in simulation and real-world robotic experiments.

Discriminative Impact Point Prediction for Robotic Catching of Diverse In-Flight Objects

Introduction

The paper introduces DIPP (Discriminative Impact Point Predictor), a framework for predicting the landing position of diverse in-flight objects under complex aerodynamics, specifically for robotic catching using a quadruped robot equipped with a basket. The work addresses two major challenges: (1) the lack of public datasets capturing the aerodynamics of a wide range of objects, and (2) the difficulty of early-stage impact point prediction when object trajectories are initially similar. To overcome these, the authors construct a real-world dataset of 8,000 trajectories from 20 objects and propose a two-module architecture: Discriminative Feature Embedding (DFE) and Impact Point Predictor (IPP), with two IPP variants (NAE-based and DPE-based). The framework is evaluated on both seen and unseen objects, demonstrating improved prediction accuracy and catching success rates.

Figure 1: Catching diverse in-flight objects with complex aerodynamics using a quadruped robot.

Dataset Construction and Analysis

A key contribution is the creation of a comprehensive dataset comprising 8,000 trajectories from 20 hand-thrown objects, each recorded at 120 Hz using a motion capture system. The dataset is augmented via translation and rotation, ensuring diversity in object shapes, sizes, and aerodynamic properties. The authors introduce the Parabola Deviation Score (PDS) to quantify the deviation of real-world trajectories from ideal parabolic motion, revealing that their dataset contains significantly more complex and non-parabolic trajectories than prior datasets (e.g., NAE [NAE-iros2021]).

Figure 2: 20 objects used for experiment.

Figure 3: Dataset analysis for our dataset and the NAE dataset.

DIPP Framework: Architecture and Training

The DIPP framework consists of two main modules:

Discriminative Feature Embedding (DFE): Utilizes an LSTM encoder to map historical states (position, velocity, acceleration) into a feature space where trajectories with similar dynamics are clustered. This enables early-stage discrimination and generalization to unseen objects.
Impact Point Predictor (IPP): Two variants are implemented:
- NAE-based: Autoregressively predicts future trajectory using LSTM and derives the impact point as the intersection with the catching plane.
- DPE-based: Directly estimates the impact point from the encoded features, offering computational efficiency but limited to fixed-height catching.

The training objective incorporates a novel Impact Point Enhanced (IPE) loss, which explicitly penalizes errors at the predicted impact point, in addition to standard teacher-forcing and reconstruction losses. This direct supervision at the impact point is shown to improve prediction accuracy.

Figure 4: Overview of the DIPP framework for catching diverse in-flight objects.

Experimental Evaluation

Early-Stage Prediction and Feature Discrimination

The models are trained on 15 objects and tested on 5 unseen objects. Impact Point Error (IE) and Success Rate (SR) are used as evaluation metrics. DIPP-NAE and DIPP-DPE consistently outperform baselines (Newtonian, SVR, NAE) in both early-stage and late-stage prediction for seen and unseen objects. The advantage is most pronounced at larger time steps to impact, indicating superior early-stage prediction.

t-SNE visualizations of the embedded features show that DIPP variants achieve clearer separation of trajectory clusters, mapping dynamically similar objects (e.g., pinwheel and boomerang) close together, which facilitates generalization to unseen objects.

Figure 5: Visualization of embedded features using t-SNE.

Trajectory Prediction Examples

Comparisons of predicted trajectories for representative objects (e.g., big plane, fan) demonstrate that DIPP-NAE yields predictions closer to ground truth than baselines, especially for objects with complex aerodynamics.

Figure 6: Big plane (seen).

Catching Performance in Simulation

Simulation experiments with a quadruped robot controlled by a PID controller show that DIPP-NAE achieves the highest catching success rates across various basket radii, for both seen and unseen objects. The improvement is attributed to the model's ability to associate unseen object trajectories with similar patterns from the training set.

Real-World Robotic Demonstration

Real-world experiments validate the practical applicability of DIPP-NAE. The robot, equipped with a basket, successfully catches both seen and unseen objects, whereas the baseline NAE fails under identical conditions.

Figure 1: Catching diverse in-flight objects with complex aerodynamics using a quadruped robot.

Ablation and Architectural Trade-offs

An ablation study compares LSTM, FC, and Transformer encoders for DFE. LSTM encoders yield the lowest impact point errors, especially for unseen objects, while Transformer-based encoders do not outperform LSTM, likely due to limited dataset size. The DPE-based IPP offers computational efficiency but is restricted to fixed-height catching, whereas NAE-based IPP is more general but computationally intensive.

Limitations and Future Directions

The current implementation relies on motion capture for object and robot state estimation, limiting deployment to indoor environments. Future work will focus on onboard perception using RGB-D cameras. The fixed-height catching constraint can be relaxed by integrating pose quality networks for manipulators. Incorporating human motion data for object identification is also proposed to further improve prediction accuracy.

Conclusion

The DIPP framework, supported by a novel real-world dataset, advances the state-of-the-art in impact point prediction for robotic catching of diverse in-flight objects under complex aerodynamics. The discriminative feature embedding and impact point enhanced loss enable accurate early-stage prediction and generalization to unseen objects, resulting in improved catching performance in both simulation and real-world settings. Future work will extend the approach to different robot platforms and catching mechanisms, with a focus on onboard perception and dynamic pose estimation.

Markdown Report Issue