Correct metrics for evaluating learning embodied agents

Develop evaluation metrics for learning embodied agents that appropriately balance task-specific performance with generalization across tasks and unforeseen scenarios, determining whether to prioritize single-task performance or broader generalization measures.

Background

Benchmarks in robotics often drive point solutions optimized for specific tasks, risking overfitting and limited generalization. Embodied agents need to generalize across tasks and changing environments.

The authors raise the question of how best to measure performance—on individual tasks or via generalization metrics—to meaningfully reflect progress in embodied intelligence.

References

Therefore the question arises of what exactly are the correct metrics for a learning embodied agent --- it is an open question if the best metrics are performance on any given task, or if the metrics should characterize generalization over many tasks or to unexpected situations.

From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence  (2110.15245 - Roy et al., 2021) in Section 6.2 (Assessing Robot Learning: Performance Evaluation)