ColorAgent: Adaptive Mobile OS Agent
- ColorAgent is a personalized OS agent for mobile devices that integrates multi-agent design and reinforcement learning for adaptive, long-horizon interactions.
- Its novel architecture features a Core Execution Module with Task Orchestration, Knowledge Retrieval, and Hierarchical Reflection to ensure robust operation.
- The system employs a progressive two-stage learning pipeline with self-evolving training, achieving state-of-the-art success rates in mobile benchmark tasks.
ColorAgent is a robust, personalized, and interactive operating system agent designed for long-horizon interactions on mobile platforms. Developed in response to advancements in hardware, software, and LLMs, ColorAgent transitions OS interfaces from command-line and traditional GUI automation to agent-based paradigms capable of high-level instruction execution and adaptive, user-centric interaction. Its architecture is distinguished by a multi-agent framework, progressive reinforcement learning, self-evolving training, and comprehensive personalization and proactive engagement capabilities (Li et al., 22 Oct 2025).
1. System Architecture and Multi-Agent Design
ColorAgent’s architecture consists of a central Core Execution Module and several augmenting agents within a tailored multi-agent framework. The Core Execution Module interfaces with the mobile OS, interprets GUI states (e.g., via screenshots), and executes candidate actions in response to user instructions. Surrounding this core are three key modules:
- Task Orchestration Module: Decomposes high-level instructions into atomic tasks, manages memory transfer across subtasks, and maintains contextual continuity throughout long-horizon operations.
- Knowledge Retrieval Module: Dynamically accesses external databases (including web content and historical trajectories) to supplement execution, especially for novel or app-specific tasks.
- Hierarchical Reflection Module: Implements multilevel error detection and recovery through Action Reflector (step-level), Trajectory Reflector (sequence-level), and Global Reflector (task-level) supervision.
This composite design ensures generalization, consistency, and robustness, enabling ColorAgent to adapt and operate efficiently in diverse and dynamic mobile environments.
2. Reinforcement Learning and Training Paradigms
ColorAgent leverages a progressive two-stage learning pipeline:
- Stage I – Step-Wise Reinforcement Learning (RL): Individual GUI interaction trajectories are split into samples comprising instruction, step-wise history, current screenshot, and action. Training optimizes single-step decision-making using Group Relative Policy Optimization (GRPO), where relative advantages are computed across candidate action batches via and the clipped surrogate objective is regularized by KL divergence. Reward signals combine action format validation and execution accuracy: , for .
- Stage II – Self-Evolving Training: To address data scarcity and enhance agent proficiency, ColorAgent iteratively generates new high-quality trajectories using its current model. High-quality query seeds are expanded using LLMs; trajectories are rolled out, filtered by specialized discriminators (task completion, action validity, path relevance, reasoning coherence), and used for supervised fine-tuning. This reduces reliance on manual annotation and drives continual improvement.
These stages collectively confer strong generalization, adaptability, and long-term reasoning capabilities.
3. Task Orchestration, Knowledge Retrieval, and Reflection Modules
The multi-agent framework enables sophisticated problem decomposition and recovery:
- Task Orchestration: The agent classifies and, if suitable, decomposes complex user instructions (). Atomic subtasks are dynamically updated using memory transfer mechanisms, formalized as where (task rewriter) and (task extractor) integrate history into subsequent actions.
- Knowledge Retrieval: Queries generated by orchestration () trigger retrieval from a database (), integrating external knowledge to resolve ambiguous or previously unseen instructions.
- Hierarchical Reflection: The agent continuously self-monitors, using
- Action Reflector (validates before-after screenshots per action),
- Trajectory Reflector (structure of short action sequences),
- Global Reflector (overall task completion and error recovery).
These components support robust performance on nontrivial, multi-step workflows.
4. Personalization and Proactive User Interaction
ColorAgent is explicitly designed for personalized user intent recognition and proactive engagement:
- Personalized Intent Recognition: By analyzing historical user interaction, the agent builds a profile of explicit SOPs and implicit behavioral routines. For new instructions, a retrieval-augmented generation (RAG) step locates the closest prior SOP and rewrites the user’s query for context-specific execution, dynamically adjusting action sequencing and query phrasing.
- Proactive Engagement (without memory): For cold-start scenarios, a two-stage learner is trained to decide when to proactively “ask” the user for clarification based on trustworthiness of context and to proceed on correct action selection post-clarification.
These personalization layers enable the agent to function not merely as an automaton but as an adaptive, collaborative partner.
5. Benchmarking and Performance Evaluation
Benchmark assessments demonstrate state-of-the-art task success and superior interaction metrics:
| Benchmark | Success Rate | Applications Evaluated |
|---|---|---|
| AndroidWorld | 77.2% | 116 tasks / 20 apps |
| AndroidLab | 50.7% | 138 tasks / 9 apps |
| MobileIAR | 58.66% (intent eval) | Personalized interaction |
| VeriOS-Bench | 68.98% (trust eval) | User trustworthiness |
The ablation studies show incremental gains with each architectural enhancement (RL, self-evolution, multi-agent modules). These results indicate both technical and interactive leadership among OS agent systems.
6. Implementation Details and Future Directions
- Implementation: The public codebase is available at https://github.com/MadeAgents/mobile-use. Core models Qwen2.5-VL-72B and GUI-Owl-32B are trained using the described RL and self-evolving paradigms on A800 GPU clusters. The GRPO method and the multi-agent modules are fully integrated into the released framework.
- Future Directions: Critical research avenues include development of holistic benchmarks for real-world agent interaction (beyond simple task success rates), optimization of collaborative agent architectures (centralized vs. distributed), and enhanced security mechanisms (sandboxing, error handling, permission control) to ensure long-term safety and adaptability.
A plausible implication is that further progress in these areas will be essential for realizing fully trusted, intelligent OS agents capable of seamless human-machine collaboration.
7. Significance and Research Context
ColorAgent establishes a new paradigm in OS agent development by integrating reinforcement learning, multi-agent coordination, self-evolving training, and user-centric interaction. Its modular architecture, rigorous evaluation, and commitment to personalization mark a substantial advance toward reliable, adaptive, and collaborative mobile assistants. While current benchmarks do not fully capture the complexities of agent-based OS interaction, the release of ColorAgent—and its thorough technical design—provides a robust foundation for both practical deployment and continued research into agent-based automation, intent modeling, and long-horizon user interaction (Li et al., 22 Oct 2025).