Papers
Topics
Authors
Recent
2000 character limit reached

LabOS: The AI-XR Co-Scientist That Sees and Works With Humans (2510.14861v1)

Published 16 Oct 2025 in cs.AI

Abstract: Modern science advances fastest when thought meets action. LabOS represents the first AI co-scientist that unites computational reasoning with physical experimentation through multimodal perception, self-evolving agents, and Entended-Reality(XR)-enabled human-AI collaboration. By connecting multi-model AI agents, smart glasses, and human-AI collaboration, LabOS allows AI to see what scientists see, understand experimental context, and assist in real-time execution. Across applications--from cancer immunotherapy target discovery to stem-cell engineering -- LabOS shows that AI can move beyond computational design to participation, turning the laboratory into an intelligent, collaborative environment where human and machine discovery evolve together.

Summary

  • The paper presents LabOS, a fully integrated platform that combines agentic AI reasoning with XR interfaces for dynamic, real-time lab collaboration.
  • It employs a multi-agent system and specialized vision-language models to achieve over 90% error detection accuracy and superior protocol generation.
  • LabOS is validated across cancer immunotherapy, cell fusion, and stem cell engineering, demonstrating enhanced reproducibility and skill transfer.

LabOS: An Integrated AI-XR Co-Scientist for Human-Machine Collaboration in Biomedical Research

System Architecture and Design

LabOS introduces a unified platform for human-AI collaboration in scientific laboratories, integrating agentic AI reasoning with extended reality (XR) interfaces for real-time, multimodal interaction. The architecture comprises a multi-agent system for dry-lab tasks—planning, coding, critique, and tool creation—augmented by a Tool Ocean that autonomously expands analytical capabilities through web and literature mining. The wet-lab module leverages XR smart glasses, enabling the AI to perceive laboratory environments via egocentric video streams and provide adaptive, context-aware guidance, error detection, and documentation.

The dry-lab agentic core builds on the STELLA framework, with a Manager Agent decomposing scientific objectives, a Developer Agent executing bioinformatics analyses, and a Critic Agent iteratively refining workflows. The Tool Creation Agent autonomously identifies and integrates new analytical resources, supporting continuous self-evolution. This architecture enables dynamic scaling and adaptation to novel research tasks, with performance improving as the system accumulates experience.

The wet-lab module employs XR glasses running a Unity/Android application, streaming video and audio to a local or cloud GPU server for real-time inference. The server invokes a vision-LLM (VLM) to interpret visual input, align actions with protocols, and provide structured feedback. The system supports 3D/4D spatial modeling using multiview camera feeds and Gaussian splatting for photorealistic, temporally consistent reconstruction, facilitating object-centric tracking and simulation-based training.

Vision-LLM Training and Benchmarking

To enable robust visual reasoning in laboratory settings, LabOS post-trains a VLM on a curated corpus of >200 egocentric lab videos (LabSuperVision, LSV), FineBio, and JoVE datasets. The training pipeline employs supervised fine-tuning (SFT) with LoRA on paired video-text examples, followed by reinforcement learning using Group Relative Policy Optimization (GRPO). The reward function is rule-based, emphasizing procedural accuracy, safety compliance, and expert-consistent reasoning.

Benchmarking on the LSV dataset reveals that leading commercial models (Gemini 2.5 Pro, GPT-4o, Qwen 2.5-VL-7B, Cosmos-Reason-1) underperform in protocol alignment and error detection, with top scores of 2.86/5 and ~2/5, respectively. In contrast, LabOS-VLM (235B) achieves >90% error detection accuracy and superior protocol generation quality, outperforming all baselines. The model demonstrates fine-grained step recognition, context-aware guidance, and real-time error correction in authentic wet-lab scenarios.

Biomedical Applications and Empirical Results

LabOS is validated in three biomedical research domains: cancer immunotherapy, mechanistic gene investigation, and stem cell engineering.

Cancer Immunotherapy Target Discovery:

LabOS autonomously analyzes CRISPRa functional screens in A375 melanoma cells co-cultured with NK cells, identifying and re-ranking candidate regulators of NK-mediated cytotoxic resistance. The system nominates CEACAM6 as a top target, confirmed by wet-lab assays and survival analysis on TCGA datasets. This demonstrates closed-loop integration of computational reasoning and experimental validation.

Mechanistic Investigation of Cell Fusion:

LabOS generates and ranks hypotheses for genes regulating cell-cell fusion, prioritizing ITSN1 via pathway enrichment and interaction priors. Experimental knockdown of ITSN1 in U2OS cells validates its role in fusion, confirming the AI's mechanistic reasoning.

Stem Cell Engineering and Skill Transfer:

LabOS copilots researchers through complex gene-editing workflows in human iPSCs, providing real-time guidance, error detection, and automated documentation. The system records expert practice and coaches junior scientists, enabling rapid skill transfer and reproducible training without extended side-by-side mentorship.

Performance, Scaling, and Deployment Considerations

LabOS establishes new state-of-the-art results on biomedical reasoning benchmarks:

  • Humanity's Last Exam: Biomedicine—32% accuracy
  • LAB-Bench: DBQA—61% accuracy
  • LAB-Bench: LitQA—65% accuracy These results represent up to 8% improvement over next-best models. Performance scales with inference-time compute, reflecting the self-evolving design.

Deployment leverages lightweight AR/XR glasses (<85g, >2h battery life, 1200+ Nits display, 6DoF gesture support), suitable for laboratory environments. Real-time streaming and inference require local GPU servers or cloud infrastructure, with data privacy and security considerations for sensitive experimental records. The system supports modular integration with laboratory information management systems (LIMS) and can be extended to other scientific domains.

Implications and Future Directions

LabOS demonstrates that multimodal, agentic AI systems can transcend digital-only reasoning to participate in physical experimentation, closing the loop from hypothesis generation to wet-lab validation. The integration of XR interfaces and specialized VLMs enables AI to perceive, understand, and act within dynamic laboratory environments, supporting reproducibility, error mitigation, and skill transfer.

Theoretical implications include the emergence of self-evolving, context-aware AI agents capable of adaptive reasoning and tool creation, with potential to generalize across scientific disciplines. Practically, LabOS offers a blueprint for intelligent laboratories, where human intuition and machine rigor co-evolve, accelerating discovery and democratizing expertise.

Future developments may focus on expanding the Tool Ocean with domain-specific modules, enhancing 3D/4D spatial modeling for complex workflows, and integrating with robotic automation for fully autonomous experimentation. Scaling to larger model sizes and broader datasets will further improve reasoning and perception capabilities. Addressing challenges in data privacy, interoperability, and human-AI trust will be critical for widespread adoption.

Conclusion

LabOS represents a comprehensive AI-XR co-scientist platform, unifying agentic reasoning and multimodal human-AI collaboration in laboratory research. Through self-evolving agents, specialized VLMs, and XR interfaces, LabOS achieves state-of-the-art performance in biomedical reasoning, real-time experiment guidance, and skill transfer. The system exemplifies the potential of AI to participate in and enhance scientific discovery, setting the stage for future intelligent laboratories where human and machine collaborate seamlessly.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.