OL-CAIS: Online Collaborative AI System
- OL-CAIS is a family of systems that integrate AI agents with human operators to enable real-time learning and coordinated task management across diverse domains.
- It employs modular architectures with specialized roles (Researcher, Synthesizer, Reviewer) orchestrated via structured workflows and meeting protocols.
- Empirical evaluations demonstrate benefits such as improved cost-effectiveness, lower latency, and enhanced coherence in collaborative tasks.
An Online Collaborative AI System (OL-CAIS) is, in one formalization, a cyber-physical system in which an AI-driven agent and a human operator jointly perform tasks in a shared environment; unlike purely offline-trained systems, it continuously refines its AI model at run-time by incorporating human feedback (Rimawi et al., 2024). In the broader literature, the term also denotes web-based and microservice-based platforms in which multiple AI agents, users, and knowledge services coordinate online through shared state, explicit workflows, or common knowledge bases. The ThinkTank framework gives a particularly explicit generalization of this paradigm by transforming specialized AI agent systems into versatile collaborative intelligence platforms through role abstraction, generalized meeting types for iterative collaboration, Retrieval-Augmented Generation, and local deployment with frameworks like Ollama and models such as Llama3.1 (Surabhi et al., 3 Jun 2025).
1. Definition, scope, and conceptual lineage
The OL-CAIS concept sits at the intersection of collaborative AI, online learning, and workflow orchestration. In the cyber-physical formulation, the system alternates between a Learning State and an Operational State. The transition criterion is model confidence: when falls below a threshold , the system solicits human labels and updates its model; when , it executes tasks autonomously, subject to occasional human override (Rimawi et al., 2024). A related dissertation extends this view by modeling three major states—steady, disruptive, and final—and by explicitly treating disruptive events, recovery, and catastrophic forgetting as first-class concerns of OL-CAIS operation (Rimawi, 20 Nov 2025).
In parallel, the term has broadened to cover multi-user online systems that are not primarily robotic. Guided Sensemaking is described as an AI-augmented multiagent discourse platform for collaborative deliberation; CollaClassroom embeds LLMs into both individual and group study panels; CoDesignAI combines multiple users and multiple AI agents in participatory urban design; DeepShovel supports collaborative scientific data extraction from PDFs; and ThinkTank frames domain-specific agent systems as candidates for transformation into universal collaborative intelligence platforms (Bhatia et al., 1 Jun 2026, Sayeed et al., 14 Nov 2025, Zhang et al., 16 Mar 2026, Zhang et al., 2022, Surabhi et al., 3 Jun 2025). This suggests that OL-CAIS is best understood not as a single architecture, but as a family of systems characterized by online coordination between humans, AI agents, and shared computational memory.
ThinkTank is significant within this lineage because it systematically generalizes agent roles, meeting structures, and knowledge integration mechanisms by adapting proven scientific collaboration methodologies. Its specification treats OL-CAIS not merely as a chatbot ensemble, but as a modular blueprint for role-based, meeting-centric, knowledge-intensive collaboration (Surabhi et al., 3 Jun 2025).
2. Architectural patterns and agent organization
A canonical OL-CAIS architecture appears in the ThinkTank-derived design specification. The system begins with a User Interface (Web/CLI) and an API Gateway, then separates control into an Agent Orchestration Service, a Collaboration Workflow Service, a Knowledge Integration Service, a Local LLM Interface, and Security & Access Control. The orchestration service instantiates the Researcher-Agent, Synthesizer-Agent, and Reviewer-Agent; manages communication via a message bus such as Kafka; and reads or writes project metadata to Postgres. The collaboration layer houses the meeting engine and round manager and persists meeting logs to MongoDB. The knowledge layer combines a vector store such as FAISS or Weaviate, an embedding service using Ollama with Llama3.1, and a RAG controller. The Local LLM Interface exposes Dockerized Ollama/Llama3.1 through gRPC endpoints, while the security layer includes OAuth2, RBAC, and an audit log (Surabhi et al., 3 Jun 2025).
Within this architecture, agent roles are mapped explicitly to microservices. The role set is
with the Researcher-Agent performing knowledge retrieval and RAG queries, the Synthesizer-Agent aggregating multiple inputs and generating summaries, and the Reviewer-Agent critiquing outputs and applying logical-consistency checks. The Orchestrator-Service implements the “Coordinator” logic from ThinkTank, the Collaboration-Service implements workflows such as stand-up, brainstorming, and review, and the Memory-Store (Agno Module) provides short-term and long-term memory shared among agents (Surabhi et al., 3 Jun 2025).
Other OL-CAIS implementations instantiate comparable divisions of labor with different role vocabularies. Guided Sensemaking uses a Socratic Guide Agent, a Reflector Agent, and a Curator Agent communicating through a lightweight message bus such as Kafka or Redis Streams; the UI consumes both personal and global graph updates in real time (Bhatia et al., 1 Jun 2026). AutoManager uses a Manager bot and a Service bot that never directly exchange free-form text, but instead communicate through a shared ASP knowledge base and collaborative rule set (Zeng et al., 9 May 2025). A multimodal sentiment-analysis OL-CAIS organizes the system into a Request Router, Task Scheduler, Model Pool, and Result Aggregator deployed in a hybrid Edge–Cloud infrastructure (Zhang et al., 2024). A plausible implication is that OL-CAIS architectures recurrently separate user interaction, orchestration, specialized reasoning roles, and persistent shared knowledge.
3. Collaboration protocols, meetings, and turn-taking
ThinkTank gives OL-CAIS one of its most explicit protocol formalisms. Let be the set of roles, the set of meeting types, and the set of workflows. The system defines
such that, for example,
where prescribes that the Researcher retrieves top-0 documents via 1 and proposes ideas for the Synthesizer. The iterative update rule is
2
where 3 is the synthesized output and 4 the reviewer critique (Surabhi et al., 3 Jun 2025).
This formalism is instantiated through three meeting templates. In Stand-Up, the User or Coordinator posts three prompts—“What was completed since last meeting?”, “What will you work on next?”, and “Any blockers?”—to each Researcher-Agent; the Reviewer-Agent briefly checks consistency, and the Synthesizer-Agent compiles a 3-bullet summary. In Brainstorming, the Coordinator broadcasts a central question, Researcher-Agents independently retrieve RAG results and propose 5 ideas each, the Synthesizer clusters and merges idea lists, the Reviewer flags duplicates or low-quality ideas, and the round repeats for 6 rounds while carrying forward a refined idea set. In Review, the Coordinator submits a draft summary, the Reviewer applies a critique checklist covering completeness, logic, and assumptions, Researcher-Agents fetch supporting evidence for each critique point, and the Synthesizer updates the document (Surabhi et al., 3 Jun 2025).
Other OL-CAIS implementations formalize collaboration differently but pursue similar control over deliberative structure. CoDesignAI uses a round-based protocol in which a Firestore field pendingUsers is updated transactionally; once every participant has contributed in a round, the AI Facilitator summarizes discussion content, extracts and reconciles shared design intentions, and emits a concise design-prompt summary (Zhang et al., 16 Mar 2026). CollaClassroom computes per-user contribution counts 7, defines the average number of turns 8, derives a priority weight 9 for quieter users, and measures overall fairness through
0
with 1 when all 2 are equal (Sayeed et al., 14 Nov 2025). Guided Sensemaking uses a small state machine of prompt types—clarification, assumption, evidence, counterargument, and implication—and enforces a back-off such that no more than one Socratic question is presented per 150 words of change (Bhatia et al., 1 Jun 2026). MultiColleagues, by contrast, ranks candidate speakers with a role-aware scoring function and uses a Facilitator agent to regulate transitions between Explore and Focus modes (Quan et al., 27 Oct 2025). Taken together, these systems indicate that OL-CAIS collaboration is typically protocolized rather than left to unconstrained chat exchange.
4. Knowledge integration, memory, and reasoning substrates
ThinkTank places knowledge integration at the center of OL-CAIS. Its RAG pipeline is specified as
3
Retrieval uses cosine similarity,
4
optionally combined with a learned ranker,
5
Given retrieved context 6, generation is modeled as
7
This pipeline is coupled to a shared memory module for short-term and long-term memory across agents (Surabhi et al., 3 Jun 2025).
Other OL-CAIS designs broaden the knowledge substrate beyond vector retrieval. AI Collaborator formalizes memory retrieval with a composite score
8
where recency decays exponentially, relevance is cosine similarity to the current prompt, and importance is average similarity to other messages in the same conversation (Samadi et al., 2024). Guided Sensemaking maintains both personal and collaborative discourse graphs; the Curator clusters semantically overlapping claims with spectral clustering on a similarity graph 9 and uses modularity and edge betweenness to surface points of contention (Bhatia et al., 1 Jun 2026).
Reliability-oriented OL-CAIS often replace or supplement RAG with explicit symbolic reasoning. In the customer-support system, the AI agent is a passive listener with a DPR-style dual-encoder retriever, a sliding window of the last 0 utterances, and a special “no-suggestion” class; if that class ranks highest or confidence falls below 1, suggestions are suppressed (Banerjee et al., 2023). In AutoManager, the shared ASP knowledge base stores menu facts, dynamic availability predicates, and collaborative rules; integrity constraints ensure, for example, that unavailable items cannot be ordered, while inter-agent communication occurs only through the shared predicate vocabulary (Zeng et al., 9 May 2025). Human-AI Collaborative Uncertainty Quantification introduces yet another substrate: a human proposes a prediction set 2, an AI computes a nonconformity score 3, and the collaborative prediction set takes an optimal two-threshold form over labels inside and outside 4, with both offline and online calibration algorithms providing distribution-free finite-sample guarantees (Noorani et al., 27 Oct 2025). This suggests that OL-CAIS knowledge integration encompasses retrieval, graph construction, symbolic constraint satisfaction, and calibrated uncertainty management.
5. Deployment, security, transparency, and governance
ThinkTank treats local deployment and data control as core OL-CAIS design requirements. It specifies three deployment modes: Local-Only, in which a Dockerized Ollama mixer runs Llama3.1 on a private GPU server; Hybrid Cloud, in which secure cloud LLM endpoints serve as fallback when local capacity is exceeded; and Kubernetes orchestration with node affinity (private vs. public) (Surabhi et al., 3 Jun 2025). Its security model combines OAuth2 with RBAC through a permission function
5
adds deny-by-default trust assumptions, requires signed certificates for each agent instance and mTLS for service-to-service calls, encrypts uploaded documents at rest with AES-256, and encrypts vector-store indexes with customer-managed keys (Surabhi et al., 3 Jun 2025).
Governance in OL-CAIS is not limited to authentication. CollaClassroom foregrounds transparency by prefixing each LLM suggestion with a badge, “AI (Personal)” or “AI (Group),” exposing whether private or shared context was used, and visually tagging contributions so participants can audit which ideas came from peers versus the LLM (Sayeed et al., 14 Nov 2025). Guided Sensemaking explicitly positions generative AI not as a shortcut to answers but as a research partner that externalizes reasoning, preserves user agency, and supports traceable sensemaking; all text remains authored by users (Bhatia et al., 1 Jun 2026). These mechanisms are user-facing forms of governance.
By contrast, AutoManager emphasizes protected internal coordination. Knowledge and information conveyed between agents are encapsulated and invisible to users, and because the LLM is used only to parse natural language into a predetermined predicate schema and to verbalize ASP outputs, adversarial utterances cannot create arbitrary new predicates or bypass integrity constraints (Zeng et al., 9 May 2025). A plausible implication is that OL-CAIS governance often combines two distinct principles: external transparency about AI participation and internal strictness about machine-to-machine state transitions.
6. Application domains and empirical evaluation
OL-CAIS has been instantiated across a notably wide range of domains. ThinkTank targets knowledge-intensive organizational problem solving (Surabhi et al., 3 Jun 2025). Guided Sensemaking is oriented toward educational and civic deliberation (Bhatia et al., 1 Jun 2026). CollaClassroom focuses on Bangladeshi university students working in individual and group study panels (Sayeed et al., 14 Nov 2025). Draw2Learn treats AI as a supportive teammate in drawing-based science learning (Hang, 2 Feb 2026). DeepShovel supports collaborative extraction of meta-information, text entities, tables, and maps from geoscience literature (Zhang et al., 2022). Other examples include online customer support (Banerjee et al., 2023), multimodal sentiment analysis in edge and cloud settings (Zhang et al., 2024), collaborative urban design (Zhang et al., 16 Mar 2026), browser-based multi-agent environment design (Charity et al., 8 Feb 2025), and co-creative shape design with humans and artificial agents (Serra et al., 2019). This breadth indicates that OL-CAIS is a cross-domain systems pattern rather than a single application class.
Empirical evaluation is correspondingly heterogeneous. In the ThinkTank demonstration, the proposed metrics include cost-effectiveness, latency, Accuracy@k, Diversity Index, and Consensus Score; the reported preliminary results are a 15% reduction in inference cost vs. AWS-based multi-agent baseline, 0.8× lower P95 latency in stand-up workflows, and improved answer coherence (+10% human-rated) in Metahuman project demo (Surabhi et al., 3 Jun 2025). Guided Sensemaking reports a small classroom pilot with 6, showing a +35 % increase in average claims per student compared to a control group using a plain chat interface, 2× deeper personal graph depth (longest support chain) after three iterations, and a high correlation (7) between acted-upon prompts and post-test critical-thinking scores (Bhatia et al., 1 Jun 2026). CollaClassroom, also with 8, reports 91.7 % agree/strongly agree on receptivity to LLM help, 66.7 % “easy” on learnability, 83.3 % “reliable” on reliability, 83.3 % “not at all” on frustration, and a very strong correlation of 9 between equal participation and meaningful contribution (Sayeed et al., 14 Nov 2025).
Professional ideation and research-assistance settings produce a different evaluation profile. MultiColleagues, in a within-subjects study with 0, reports significantly higher teammate-like feel, complementary strengths, engagement and flow, creative exploration, and outcome quality and novelty than a single-agent baseline, with reported effect sizes ranging from 1 to 2 on those measures (Quan et al., 27 Oct 2025). DeepShovel, evaluated with 14 researchers from 9 geoscience teams, reports that all core tasks were supported, and the post-study questionnaire gives future use: 3, recommend to peers: 4, and DeepShovel improved data integration: 5 (Zhang et al., 2022). Draw2Learn presents formative usability sessions with six participants and mean ratings out of 7 including Attractiveness: 6.17, Overall Experience: 6.15, Ease of Use: 5.67, and Overall Mean: 5.47 (Hang, 2 Feb 2026). In customer support, 18 agents reported time savings, and the DPR model achieved no-suggestion MRR = 0.84 and faq MRR = 0.50 in the best reported configuration (Banerjee et al., 2023). The available evidence is therefore strongly application-dependent, but it repeatedly evaluates OL-CAIS in terms of collaboration quality, human workload, response quality, and structured participation.
7. Resilience, greenness, and unresolved research issues
A major strand of OL-CAIS research concerns resilience under disruption. In the cyber-physical model, performance is measured online with the Autonomous Classification Ratio
6
where 7 if the system operated autonomously and 8 if it required human input or correction. Disruptions are modeled through an instantaneous drop, exponential decay during disruption, and first-order recovery toward a nominal performance level 9. The associated resilience metrics include maximum performance drop 0, recovery time 1, area under the degraded curve, counts under and above threshold, and human interaction ratio (Rimawi et al., 2024). In a collaborative-robot case study with 208 iterations and 2, the system reached a first steady state after 34 iterations, dropped to 3 when the lights were switched off, and recovered in approximately 37 iterations, with HI_Avg = 0.54 and an approximate degraded-area sum of 8.3 (Rimawi et al., 2024).
The greenness-resilience formulation extends this by defining
4
energy
5
CO6 cost
7
and a combined greenness cost
8
Recovery is then handled by one-agent multi-objective optimization, two-agent game-theoretic decision-making, or RL-agent policies. In the reported experiments, RL-agent policies achieved the fastest average recovery (≈115 iterations) vs two-agent (≈118), one-agent (≈122), internal (≈138); the performance fluctuation ratio dropped from ≈0.35 under internal to ≈0.04 under RL-agent; and containerized execution reduced energy and emissions from 0.6 kWh, 0.198 kg CO9eq to 0.3 kWh, 0.099 kg CO0eq, a 50 % reduction (Rimawi, 20 Nov 2025). This line of work shows that OL-CAIS performance is not only a collaboration problem, but also a monitoring, control, and infrastructure problem.
Several unresolved issues recur across the literature. Guided Sensemaking states that full empirical results are planned for future work (Bhatia et al., 1 Jun 2026). Draw2Learn identifies a small sample (n=6), no direct measures of learning gains, limited input modalities, and low AI transparency (Hang, 2 Feb 2026). AF-Online explicitly notes no real-time co-editing or branching/merging beyond simple parent→child remixing in version 1 (Charity et al., 8 Feb 2025). AI Collaborator highlights GPT-4 latency at more than 1 second per message, memory overhead, and persona drift (Samadi et al., 2024). CoDesignAI remains a proof-of-concept without a formal user study and flags cost, socio-emotional intelligence, power imbalance mediation, and the need for real-time synchronous updates as future concerns (Zhang et al., 16 Mar 2026). A reasonable synthesis is that OL-CAIS research has moved beyond single-agent assistance, but still faces open problems in scalability, evaluation rigor, governance, robustness, and long-horizon human–AI adaptation.