Human-Centered AI & XR Innovation

Updated 2 October 2025

Human-Centered AI + XR Innovation is defined by the integration of AI into immersive XR systems with a focus on user needs, usability, and ethical design.
It employs modular frameworks and rapid prototyping techniques to seamlessly merge AI algorithms with XR toolkits, fostering agile human–machine collaboration.
The approach prioritizes adaptive design and responsible innovation, ensuring real-time interactivity and personalization across diverse application domains.

AI and Extended Reality (XR) are converging to produce fundamentally new modes of interactive computing, with a special emphasis on human-centered design, rapid innovation cycles, and the expansion of agency within immersive environments. Human-Centered AI + XR Innovation defines the rigorous integration of AI algorithms, models, or agents into immersive XR systems, with an explicit focus on aligning with user needs, organizational dynamics, and broad principles of usability, explainability, and trust. This confluence seeks to reduce technical and creative friction, enable rapid prototyping and deployment, foster agile collaboration between stakeholders (human and machine), and address the nuanced perceptual, cognitive, and social dimensions of human–AI interaction in XR.

1. Core Principles and Unifying Frameworks

Human-centered AI + XR innovation is structured around several unifying frameworks and methodological innovations. XR Blocks introduces a cross-platform, script-driven architecture, explicitly abstracting the ecosystem gap between mature AI frameworks (e.g., TensorFlow, Gemini) and XR toolkits (e.g., WebXR, three.js) through modular “plug-and-play” components for user, world, peers, interface, context, and agents (Li et al., 29 Sep 2025). The framework emphasizes reducing friction from concept to interactive prototype by enabling creators to describe high-level intent without bespoke integration of perception, rendering, and interaction subsystems.

The theoretical backdrop often involves continuum or multi-dimensional models. The “XR-AI continuum” delineates two poles: XR as a tool ‘for AI’ (systematic investigation of AI interface/embodiment) and AI as an enabler ‘for XR’ (AI-driven enhancement of multimodal interaction) (Wienrich et al., 2021). Another model, the Socio-Cyber-Physical Taxonomy (Mann et al., 2024), positions XR as the “physical spatial metaverse” at the intersection of physicality (atoms), virtuality (bits), and sociality (genes), providing a formalism: $\textbf{XR} = \{\, s(\alpha),\, s(\beta),\, s(\gamma) \,\}$ where the variables parameterize the scales of physical, virtual, and social engagement. This structure informs both system design and evaluation, highlighting traditionally underexplored regions (e.g., Diminished Reality) and providing scaffolds for sustainable, people-centric application domains.

The Self++ framework, grounded in Self-Determination Theory, defines a staged model (competence, autonomy, relatedness) for progressive human–AI collaboration, deliberately eschewing technological determinism in favor of adaptive, staged empowerment, with dynamic agency balancing and mitigations for over-reliance and cognitive bias (Piumsomboon, 15 Jul 2025).

2. System Architecture, Modularity, and Toolkits

Modern frameworks prioritize modular, extensible architectures to decouple application logic from low-level device, rendering, and sensing dependencies. XR Blocks organizes primitives—user, world, context, interface, peer, agent—through a declarative scripting system. This allows creators to write scripts that operate over a unified “reality model” (R), with scripting function S: $S: R \rightarrow I$ where $I$ is the set of interactive outputs, and R is dynamically composed from sensory, contextual, and AI pipelines (Li et al., 29 Sep 2025).

XARP Tools provides a server-side Python API and platform-agnostic XR clients, enabling both human and AI agents to interact with XR scenes via high-level JSON commands (over WebSockets) (Caetano et al., 6 Aug 2025). The interface is further extensible as a Model Context Protocol (MCP) server, enabling standardized AI–XR interoperation and context exchange for joint tool usage by humans and AI agents.

These modular approaches—along with contemporaries such as AtomXR, which streamlines creation via a high-level scripting language (AtomScript), natural language interfaces with LLMs, and immersive in-headset authoring (Cai et al., 2023)—enable rapid prototyping, reproducible research, and greater democratization of AI-driven XR innovation.

3. Human-Centered Design, Stakeholder Involvement, and Responsible Innovation

Human-centered design is integral to all referenced frameworks. The Responsible AI Implementation Framework (RAIIF) (Tjondronegoro et al., 2022) and the reflective practice documented in OpenXR educational design (Greenbaum et al., 28 Jun 2025) emphasize iterative, agile co-creation cycles involving business, technical, operational, regulatory, and end-user stakeholders. The RAIIF model, expressed as: $\begin{array}{c} \textbf{Plan %%%%6%%%% Design} \ \downarrow \ \textbf{Develop %%%%6%%%% Deploy} \ \downarrow \ \textbf{Operate, Monitor %%%%6%%%% Scale} \ \end{array}$ with bidirectional co-creation at each stage, operationalizes continual trust-building, privacy-by-design, and real-world feedback in AI and XR solution lifecycles.

Educational and collaborative innovation further leverage human-centered methodologies. Problem-posing education and transformative learning (Greenbaum et al., 28 Jun 2025) position instructors and learners as co-creators, emphasizing critical reflection, iterative prototyping, and responsiveness to learner personas and social context.

Self++ (Piumsomboon, 15 Jul 2025) extends this to co-determined AI autonomy, allowing users to actively configure the degree of agentic AI, with transparent nudging, collaborative decision-making, and social facilitation embedded into immersive XR environments.

4. Technical Enablers: Perception, Interaction, and AI Integration

Successful human-centered AI–XR systems depend upon robust integration of perception (tracking, sensor fusion), naturalistic interaction, and dynamic AI agents:

XR Blocks and AtomXR use real-time depth maps, gesture, gaze, hand, and voice inputs—integrated with AI via pre-trained models or on-device inference (using TensorFlow.js, Gemini, or LLMs)—to enable multimodal, spatially context-aware interfaces (Li et al., 29 Sep 2025, Cai et al., 2023).
EmBARDiment (Bovo et al., 2024) reduces reliance on explicit prompts by tracking sustained eye-gaze fixations (thresholded at 120 ms), dynamically building a contextual memory buffer for immediate, context-aware LLM responses, formalized as: $\text{If } f(w) \ge \tau, \text{ then } w \in \mathcal{M}$ where $f(w)$ is the fixation duration and $\mathcal{M}$ is the working memory buffer.
AtomXR’s pipeline fuses voice, eye-tracking, and gesture to directly generate high-level scripting code, mitigating intent ambiguity and significantly reducing time-to-prototype for non-programmers (Cai et al., 2023).
In networked contexts, AI-assisted service provisioning for XR leverages predictive frame processing at the edge (see (Laha et al., 2023)), formalized as: $D_{DL}^{k, n} = T_{pred}^{n} + D_{UB}$ where $T_{pred}^{n} = p_d^{n} \cdot T_f^{n}$ , effectively increasing the virtual delay budget while maintaining URLLC requirements.

Innovations such as Heads Up eXperience (HUX) (K et al., 2024) and GeSa-XRF (Yang et al., 2024) further exemplify multi-modal fusion, context-aware attention, semantic-aware transmission, and generative refinement to optimize both user experience and technical efficiency in resource-constrained or perceptually demanding XR contexts.

5. Adaptivity, Trust, Ethics, and Evaluation

A central aim of human-centered AI + XR innovation is the adaptive optimization of system behavior for diverse users and contexts:

Frameworks such as XR Blocks and AtomXR support “plug-and-play” composition of interaction paradigms, enabling rapid, iterative model tuning and user-driven script modification.
Self++ (Piumsomboon, 15 Jul 2025) and the RAIIF model (Tjondronegoro et al., 2022) explicitly address adaptive agency and continuous trust-building, mandating transparency in AI-driven nudging, privacy protection, and the right to user override. Emphasis is placed on co-creation and adaptive hand-off strategies to prevent cognitive overload, overreliance, or loss of control.
Empirical findings from (Wienrich et al., 2021) show gender and presentation effects (e.g., the Eliza effect) in embodied AI evaluation; interfaces thus require tailoring for perceptual diversity and potentially dynamic adaptation to user gender, expertise, or context.
Evaluation strategies incorporate not only technical performance (such as Pareto-optimal trade-offs between speed, quality, stability, and resource constraints for on-device LLMs (Khan et al., 13 Feb 2025)), but also metrics of trust, usability, explainability, workload (NASA-TLX), and social quality (Godspeed, NARS), with an eye toward real-world, iterative validation.

Responsible innovation is treated as a multi-faceted challenge: ethical guidelines and fail-safes (e.g., for bias mitigation, privacy, and agency (Li et al., 13 Mar 2025, Piumsomboon, 15 Jul 2025)) are integrated “by design,” and methodologies for continual governance and multidisciplinary oversight are encouraged.

6. Applications, Case Studies, and Impact

Human-centered AI + XR frameworks have been deployed across a spectrum of domains:

In creative production, MS2Mesh-XR enables sketch-plus-voice-driven mesh generation in VR/MR, using diffusion and triplane-based models to democratize 3D asset creation (Tong et al., 2024).
In manufacturing, ISO 23247-based digital twins leverage XR as a locus for real-time human–machine collaboration, integrating AI agents for predictive analytics, remote monitoring, ergonomic feedback, and sustainability KPIs (Cao et al., 20 Aug 2025).
Education and training environments employ human-centered design, social co-creation, and adaptive XR platforms to shift from knowledge transfer to critical, reflective, and collaborative learning (Greenbaum et al., 28 Jun 2025).
In personalized computing and everyday environments, wearables (smart eyeglasses) act as multi-sensory, AI-powered extensions for sustainability, healthcare, and frontline work, maintaining the six core properties: unmonopolizing, unrestrictive, observable, controllable, attentive, communicative (Mann et al., 2024).

Notably, the rapid prototyping and democratization achieved by platforms like XR Blocks significantly lower the barrier for researchers and practitioners to test new agentic, perceptual, or social interaction paradigms—with an emphasis on reproducibility and community-driven extensibility.

7. Future Directions and Open Challenges

Current research foregrounds several persistent challenges and avenues:

Architectural scalability: Ensuring that modular frameworks maintain responsiveness and low latency as user, world, and model counts scale, especially for multi-user or federated environments (Li et al., 29 Sep 2025).
Domain adaptation: SLAM and perceptual algorithms remain brittle across environmental and behavioral diversity (e.g., dynamic indoor/outdoor transitions, unpredictable head/gaze trajectories) (Chandio et al., 2024). Hybrid and adaptive approaches leveraging input profiling and error-informed feedback are recommended.
Generalizable evaluation: Cross-platform benchmarks and multi-objective optimization (as in (Khan et al., 13 Feb 2025)) are needed to guide model-device deployment for real-time interactive performance.
Inclusive design: Responsible innovation must center on adaptability for cognitive diversity, bias mitigation, explainability, and transparent user control. Proactive governance, multidisciplinary standardization, and iterative codevelopment (as in Self++ and RAIIF) are necessary to ensure that XR and AI systems remain aligned with human flourishing and ethical principles (Tjondronegoro et al., 2022, Piumsomboon, 15 Jul 2025).
Semantic and generative communication: Ongoing work in semantic-aware networks and generative post-processing reduces bandwidth and enhances visual fidelity in constrained wireless environments, further strengthening the technical and experiential case for AI-powered XR (Yang et al., 2024).

In summary, human-centered AI + XR innovation represents a shift from bespoke, siloed development toward modular, adaptive, and ethically grounded frameworks capable of bridging abstract user intents and technical realization at scale. The maturation of such approaches is accelerating the transition to interactive computing paradigms where AI and XR are not merely coexistent, but symbiotically drive agency, creativity, and real-world impact.