Presenting and Interacting with AI-Derived Scene Understandings in Mixed Reality

Determine the most effective user interface designs and interaction techniques for presenting and enabling user interaction with AI-generated understandings of physical scenes within mixed reality systems, so that users can meaningfully engage with and manipulate these AI-derived structures during MR tasks.

Background

The paper introduces Reality Proxy, a proxy-based interaction paradigm that leverages computer vision and multimodal LLMs to extract spatial hierarchies and semantic attributes of real-world objects, enabling fluid interaction with these AI-derived scene structures in mixed reality. The system reifies such structures as manipulable proxies to support tasks like multi-selection, filtering, grouping, and zooming.

In the Discussion, the authors highlight a broader challenge at the intersection of MR and AI: prior MR+AI work often presents annotations passively, whereas Reality Proxy makes them interactable. Despite this progress, the authors explicitly note that the best way to present or interact with AI’s scene understandings is still unknown, underscoring a need to identify effective design patterns for exposing and manipulating these AI-derived representations in MR.

References

Yet the best way to present or interact with AI's scene understandings remains an open question.

Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations  (2507.17248 - Liu et al., 23 Jul 2025) in Discussion, Fostering Human–AI Collaboration through Interactions