Precise low-level keyboard-and-mouse action execution in 3D virtual environments

Develop methods that enable embodied agents operating through a keyboard-and-mouse interface in 3D virtual environments to execute precise, low-level actions with high reliability and control, so they can robustly perform fine-motor tasks across diverse games and scenarios.

Background

SIMA 2 acts via the same human-computer interface as players—RGB frames and keyboard-and-mouse actions—without privileged state access. While it approaches human-level performance on many tasks, the paper notes persistent difficulties in fine-motor control (e.g., combat) and precise action execution required for complex interfaces and fast dynamics.

In the Discussion, the authors explicitly identify precise, low-level action execution via keyboard-and-mouse control as an open challenge for the broader field, underscoring the need for techniques that deliver robust, fine-grained control across varied environments, menus, and interaction patterns.

References

Finally, executing precise, low-level actions via the keyboard-and-mouse interface and achieving robust visual understanding of complex 3D scenes remain open challenges that the entire field continues to work to address.

SIMA 2: A Generalist Embodied Agent for Virtual Worlds (2512.04797 - Team et al., 4 Dec 2025) in Discussion