Correct metrics for evaluating learning embodied agents
Develop evaluation metrics for learning embodied agents that appropriately balance task-specific performance with generalization across tasks and unforeseen scenarios, determining whether to prioritize single-task performance or broader generalization measures.
References
Therefore the question arises of what exactly are the correct metrics for a learning embodied agent --- it is an open question if the best metrics are performance on any given task, or if the metrics should characterize generalization over many tasks or to unexpected situations.
— From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence
(2110.15245 - Roy et al., 2021) in Section 6.2 (Assessing Robot Learning: Performance Evaluation)