Unknown normalization details for Gato’s Procgen results
Ascertain the score of the data collection policy used by Gato for Procgen and determine how its reported performance is normalized relative to standard Procgen normalization scores in order to enable a direct, fair comparison.
Sponsor
References
While Gato also reports numbers in Procgen, we are not able to compare to these numbers because Gato reports performance relative to unknown score of the data collection policy. To the best of our knowledge the score of the data collection policy is not released. Thus, it is unclear how the Gato Procgen performance is normalized according to the standard Procgen normalization scores rendering a direct comparison impossible.
— From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
(2412.08442 - Szot et al., 11 Dec 2024) in Appendix: Further Experimental Details, Additional Baseline Details (Procgen)