- The paper introduces a unified physical AI architecture that integrates world understanding, generation, and prediction for robust long-horizon modeling.
- It employs a hybrid linear attention mechanism combining SWA, DSWA, and GLA to achieve efficient inference and maintain high fidelity over extended sequences.
- It leverages a cross-embodiment data curriculum that progressively aligns open-world observations, human demonstrations, and robotic traces for self-evolutionary learning.
Kairos: A Native World Model Stack for Physical AI
Motivation and Overview
Kairos introduces a unified world-model stack purpose-built for Physical AI, advancing from traditional generative models toward infrastructure capable of self-evolutionary learning and real-world deployment. The system is architected to address four entrenched bottlenecks in world modeling: (1) integrating heterogeneous data for broad physical intelligence; (2) persistent, long-horizon state maintenance; (3) cross-domain perception-action grounding for embodied control; and (4) deployment under strict hardware constraints. The core objective is to synthesize robust, transfer-ready physical understanding directly into the backbone of a natively scalable world-action model.
Figure 1: Motivation for Kairos as an operational infrastructure for future self-evolving Physical AI, beyond traditional generative paradigms.
Unified Architecture: Understanding, Generation, Prediction
Kairos departs from disjointed modular systems and implements a natively unified backbone integrating three principal interactive modules:
Hybrid Linear Attention and Theoretical Justification
High-fidelity, long-horizon generation necessitates scalable temporal modeling. Kairos implements a hybrid linear attention regime, where:
- SWA encodes short-range motion patterns with efficient locality.
- DSWA (dilated variant) extends the receptive field for mid-horizon dependencies.
- GLA, based on Gated Delta Networks (GDN), provides linear-complexity global memory, controlling state propagation and suppressing error drift via a contractive, gated delta-rule.
Crucially, a rigorous theoretical framework demonstrates the necessity (information-theoretic lower bound) of persistent latent states for long-horizon prediction and proves the sufficiency (explicit risk bound) of the hybrid temporal decomposition, given contractive global memory.
Figure 4: DiT block architecture with hybrid linear attention: integrating SWA, DSWA, and GLA for multi-scale temporal modeling.
Figure 6: Gated Delta Network: implements the gated linear attention module with delta-rule memory update, supporting efficient, bounded-length global memory.
Data Curriculum and Native Pre-training
Rather than post-hoc fine-tuning generic video models, Kairos implements a Cross-Embodiment Data Curriculum (CEDC) for scalable, native physical intelligence. The pre-training trajectory is strictly hierarchical:
- Physical Knowledge: Massive-scale open-world video observation imparts physics priors and universal regularities.
- Human-centric Behavior: Human demonstration data enables structured task understanding and causal intervention modeling.
- Robotic Action (Embodiment): Scarce but critical robot traces inject perception-action alignment, mechanically grounding the world model for real execution.
This curriculum is realized via multi-stage pre-training, progressive fine-tuning (domain-specific SFT, model merging), and RL-based preference alignment.
Figure 3: Cross-Embodiment Data Curriculum: progressively aligning open-world, human demonstration, and robot embodiment data for robust native pre-training.
Efficiency, Inference, and Deployment
Kairos incorporates deployment-aware optimization throughout:
- Scalable inference: Linear-complexity attention and DiT-cache optimizations ensure sub-millisecond per-frame generation on Nvidia A800 and RTX5090 hardware.
- Hardware-awareness: Mixed precision, adaptive quantization (FP8, INT4), and tile-based streaming facilitate real-time inference even on consumer GPUs.
- Timestep distillation: Leveraging flow-matching and distribution-matching distillation compresses multi-step diffusion into efficient 4-step generators with negligible quality loss.
Against large-scale competitors (Cosmos, Lingbot, Wan), Kairos achieves superior compute/memory efficiency and lowest latency across resolutions and generation durations.





Figure 5: Performance comparison: Kairos achieves SOTA benchmarks in action models and embodied world models while scaling linearly with duration and outperforming larger baselines in efficiency.
Self-Evolutionary Learning and Closed-Loop Operation
Kairos is architected for self-improving learning cycles: understanding, generation, and prediction are mutually accessible in a closed-loop. During deployment, the model natively supports rollout-evaluation-refinement cycles for self-evolution, including prompt optimization agents and task-centric policy refinement. Internal reward and selection mechanisms (driven by the understanding module) enable automated, continual optimization without human intervention.
Figure 7: Self-evolution framework: closed-loop rollout-evaluation-refinement cycle enables continuous self-improvement in deployed environments.
Benchmarking and Empirical Results
Kairos-4B delivers SOTA or high-ranking results on diverse, rigorous benchmarks:
Practical and Theoretical Implications
Practical Impact: Kairos is the most complete demonstration to date of a native, scalable world–action infrastructure for Physical AI, simultaneously addressing data heterogeneity, long-horizon consistency, efficient deployment, and continual adaptation. The ability to operationalize on consumer hardware and the closed-loop self-evolution pathway make it deployable for both research and real-world robotics.
Theoretical Impact: The formal results on information-theoretic necessity and architectural sufficiency define foundational limits for long-horizon world modeling. The hybrid attention design and curriculum-based pretraining strategy set a new standard for future world-action architectures.
Future Directions: Ongoing work focuses on fully autonomous self-evolution (recursive imagination and policy update from real-world closed loops), scaling to universal action spaces for heterogeneous embodiments, and extending the world model substrate to more diverse physical domains and sensor modalities.
Conclusion
Kairos establishes a new paradigm for world models in Physical AI: an endogenously unified, curriculum-driven, and theoretically-grounded system that learns, maintains, and deploys robust world knowledge across long horizons and embodiment axes. The principled architecture, together with strong empirical results and system scalability, positions Kairos as a foundational substrate for next-generation, self-evolving physical intelligence.
Reference: "Kairos: A Native World Model Stack for Physical AI" (2606.16533)