Principled evolution of evaluation metrics to avoid overfitting and point solutions

Devise principled procedures to systematically update and diversify evaluation metrics over time so as to prevent overfitting and discourage point solutions in robotics benchmarks and challenges.

Background

Goodhart’s law highlights that fixed metrics can become targets, ceasing to measure what matters. Robotics challenges sometimes incentivize narrow solutions that do not generalize.

The authors call for principled ways to evolve metrics to sustain progress on generalizable embodied intelligence rather than point solutions.

References

Another open question is how to systematically change evaluation metrics in a principled way to avoid overfitting and point solutions.

From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence  (2110.15245 - Roy et al., 2021) in Section 6.2 (Assessing Robot Learning: Performance Evaluation)