Determine the required scale of synthetic data

Determine the scale of synthetic egocentric hand–object interaction data needed to achieve strong performance and diminishing returns when training and adapting hand–object interaction detectors.

Background

Generating synthetic data is comparatively cheap, but training and storage are not. Identifying how much synthetic data is necessary to saturate performance helps optimize resource allocation.

The paper explores performance as a function of synthetic dataset size to locate a practical operating point.

References

As a result, several key open questions still need to be addressed: 1) How large is the gap between synthetic and real data? 2) What are its main causes? 3) How can it be minimized? 4) Can synthetic data fully replace real-world data? 5) Is it possible to leverage synthetic data when real-world data is unlabeled? 6) Can it improve performance when only a small amount of real-world labeled data is available? 7) What scale of synthetic data is required?

Leveraging Synthetic Data for Enhancing Egocentric Hand-Object Interaction Detection  (2603.29733 - Leonardi et al., 31 Mar 2026) in Section 1 (Introduction)