Sample accounting and value of multi-agent self-play samples

Ascertain whether multi-agent samples collected by controlling all agents via self-play should be valued equivalently to per-time-step samples for measuring and comparing sample efficiency in reinforcement learning for driving simulators.

Background

The paper reviews concurrent simulators and training setups that control all vehicles via self-play, causing reported sample counts to scale with the number of agents. The authors point out the ambiguity in how such multi-agent samples should be counted and compared to standard time-step samples.

They explicitly state that the equivalence of these samples is unclear, suggesting a need for principled guidelines or empirical studies to standardize sample accounting and comparison.

References

Whether multi-agent samples have the same value as time step samples is unclear, since they could also be viewed as augmented versions of the same sample.

— CaRL: Learning Scalable Planning Policies with Simple Rewards (2504.17838 - Jaeger et al., 24 Apr 2025) in Appendix, Section 'Related work', discussion of GPUDrive and GIGAFLOW

Sample accounting and value of multi-agent self-play samples

Background

References

Related Problems