- The paper introduces XR-CareerAssist, a novel immersive platform that integrates extended reality and multimodal AI for personalized career guidance.
- The system employs a five-layer scalable architecture with real-time speech recognition (>95% accuracy) and rapid Sankey diagram generation (0.2 sec), ensuring dynamic, user-centered interaction.
- Empirical pilot evaluation demonstrated high user satisfaction (over 78%) and robust backend performance, validating its potential for widespread institutional adoption.
Immersive, Multimodal Career Guidance: Overview of XR-CareerAssist
Introduction and Theoretical Foundations
"XR-CareerAssist: An Immersive Platform for Personalised Career Guidance Leveraging Extended Reality and Multimodal AI" (2604.06901) addresses the enduring limitations of Computer-Assisted Career Guidance Systems (CACGS) by integrating Extended Reality (XR) and advanced multimodal Artificial Intelligence. Traditional CACGS approaches rely on static user interfaces and trait-factor matching paradigms, lacking both the engagement and personalization necessary for modern, dynamic career trajectories. This work draws on Career Construction Theory (CCT), shifting toward narrative, experiential, and user-centred forms of career development, and operationalizes these principles via real-time, interactive, and data-driven immersive technologies.
Multimodal System Architecture
XR-CareerAssist features a five-layer modular architecture designed for scalability, extensibility, and efficient multimodal data processing. The architecture interlinks: an XR user interface optimized for hands-free interaction via Meta Quest 3, an application layer orchestrating logic and session management, an integration layer for abstracting backend APIs and AI model calls, a services layer hosting all AI modules, and a data layer maintaining user profiles and profile-derived analytics.
Figure 1: XR-CareerAssist architecture with clearly delineated layers facilitating modular deployment and maintenance.
The system coordinates five principal AI components:
- Automatic Speech Recognition (ASR): Real-time multilingual speech input processing with >95% accuracy.
- Neural Machine Translation (NMT): Dynamic language support (English, Greek, French, Italian) for cross-lingual accessibility.
- Conversational Agent: A context-aware dialogue system built with Langchain, integrating profile data for personalized guidance.
- Vision-Language (VL) Model: BLIP-based and domain-finetuned for interpreting and answering user queries about Sankey diagram career visualizations.
- Text-to-Speech (TTS): High-fidelity synthesis routed through a 3D avatar using AWS Polly, following initial PIPER evaluation.
User Experience and Multimodal Pipeline
The core user journey orchestrates seamless voice-driven input, dynamic multilingual translation, data retrieval, advanced visualization, and naturalistic dialogue output. The system abstracts technical complexity to the user, enabling intuitive engagement with sophisticated analytics and dialogue.
Figure 2: The complete user-AI interaction loop, from multimodal input to voice output via avatar.
Precision in speech recognition is maintained even under VR constraints, while rapid NMT enables accessibility for non-English users. Fine-tuned VL models extend BLIP's generalist vision-language capacity to specialized career data queries and Sankey diagram interpretation, supporting deep, contextually relevant guidance.
Figure 3: High-accuracy ASR and real-time NMT translation enable robust, language-agnostic voice interfaces within XR.
Integration of the personalized Sankey diagram generator further strengthens the narrative and exploratory aspects of the platform by grounding career path recommendations in a curated dataset of 100,000+ anonymized professional CVs.
Figure 4: VL model successfully answers complicated user queries on Sankey-based career transition patterns, confirming fine-tuned model competence.
Extensive backend optimizations significantly reduce latency and support high concurrency. Sankey diagram generation time was reduced from 45 seconds to 0.2 seconds (99.56% reduction), meeting sub-second responsiveness criteria for interactive XR workloads. API endpoints are served using FastAPI with GPU-enabled AWS Elastic Beanstalk infrastructure, and employ DuckDB for fast in-memory analytics.
Figure 5: Under 1-second median latency and zero failure for 10,000 concurrent users demonstrate production-grade scalability.
These characteristics are essential for real-world institutional deployment in high-demand academic or workforce development contexts.
Data-Driven Visualizations and Personalized Pathways
Career progression visualizations are dynamically generated from user questionnaire input and analyzed against the CVCOSMOS profile database, yielding empirically grounded career and industry transition Sankey diagrams.
Figure 6: Sankey diagram for a high-experience profile, visualizing empirical transition probabilities across a 10-year horizon.
Figure 7: Role evolution map illustrates the probabilistic progression from employee to director roles within the cohort dataset.
Figure 8: Industry shift diagram quantifies inter-sector mobility trends indicated by career data analytics.
These outputs support granular, evidence-based career scenario exploration.
XR User Interface and Pilot Evaluation
Unity-based development delivers a VR-forward user interface, including responsive questionnaire flows and a high-visibility 3D avatar assistant. Iterative UI refinement addresses VR-specific usability, motion sickness, and accessibility constraints.
Figure 9: Dynamic, VR-optimized questionnaire component for collecting relevant profile data.
Figure 10: Immersive VR environment providing simultaneous voice assistant interaction and live career mapping.
A 23-participant pilot at the University of Exeter validated overall system usability, responsiveness, and perceived career guidance value. Notably, the ASR module achieved a 95.6% accuracy rating, and overall user satisfaction exceeded 78%. No failures were observed under pilot or load-testing.
Figure 11: Pilot hardware and environment schematic at University of Exeter.
Critical UX improvements post-pilot addressed comfort, clarity, safety, and text readability, demonstrating agile responsiveness to real-user feedback.
Implications, Limitations, and Future Work
XR-CareerAssist advances state of the art for CACGS through:
- Deep multimodal AI integration within an immersive, empirically grounded interactive environment;
- Demonstrated backend scalability and real-world pilot validation;
- Facilitation of narrative, longitudinal career reflection consistent with CCT.
The system remains bounded by several constraints: limited language coverage, single-session dialogue memory, requirement for standalone VR hardware, and sample demographic skew in pilot validation. Ongoing work should prioritize expanded language/model support, persistent conversation histories, mobile/AR accessibility, and diversified, longitudinal effectiveness studies.
Conclusion
XR-CareerAssist delivers a production-scale framework for immersive, personalized career guidance, leveraging fine-tuned multimodal AI and rigorous data analytics in a VR environment. Performance metrics highlight significant advances in interactivity and responsiveness, while empirical user evaluation confirms system readiness for institutional adoption. The architecture sets a precedent for future XR-AI deployments in guidance, education, and workforce analytics, with the potential to extend this multimodal approach across diverse competency development and support systems.