Sample accounting and value of multi-agent self-play samples
Ascertain whether multi-agent samples collected by controlling all agents via self-play should be valued equivalently to per-time-step samples for measuring and comparing sample efficiency in reinforcement learning for driving simulators.
References
Whether multi-agent samples have the same value as time step samples is unclear, since they could also be viewed as augmented versions of the same sample.
— CaRL: Learning Scalable Planning Policies with Simple Rewards
(2504.17838 - Jaeger et al., 24 Apr 2025) in Appendix, Section 'Related work', discussion of GPUDrive and GIGAFLOW