Sim-to-Real Transfer for File-System Behavioral Personalization

Determine how to achieve robust sim-to-real transfer for memory-centric personalized file-system agents evaluated on FileGramBench by enabling methods that maintain high performance when moving from simulated behavioral trajectories to real-world human screen recordings in the benchmark’s Real-World setting.

Background

FileGramBench includes a Real-World setting that evaluates methods on human screen recordings, complementing simulated file-system trajectories. In the main results, all evaluated systems experienced a sharp performance drop to single-digit accuracy on these real-world recordings, revealing a significant gap between structured trace analysis and video-level behavioral understanding.

Within the ethical and limitations discussion, the authors explicitly identify sim-to-real transfer as an unresolved issue. They position this gap as a key research challenge for deploying memory-centric personalized agents outside controlled simulation, where noise, pacing variability, and unstructured visual inputs complicate trace-based reasoning.

References

With 20 profiles and 640 trajectories the benchmark operates at moderate scale; the sharp accuracy drop in the Real-World setting confirms that sim-to-real transfer remains an open challenge.

FileGram: Grounding Agent Personalization in File-System Behavioral Traces  (2604.04901 - Liu et al., 6 Apr 2026) in Appendix: Discussion and Resources, Ethical Considerations