Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Collaborative Document Editing with Multiple Users and AI Agents (2509.11826v1)

Published 15 Sep 2025 in cs.HC and cs.CL

Abstract: Current AI writing support tools are largely designed for individuals, complicating collaboration when co-writers must leave the shared workspace to use AI and then communicate and reintegrate results. We propose integrating AI agents directly into collaborative writing environments. Our prototype makes AI use transparent and customisable through two new shared objects: agent profiles and tasks. Agent responses appear in the familiar comment feature. In a user study (N=30), 14 teams worked on writing projects during one week. Interaction logs and interviews show that teams incorporated agents into existing norms of authorship, control, and coordination, rather than treating them as team members. Agent profiles were viewed as personal territory, while created agents and outputs became shared resources. We discuss implications for team-based AI interaction, highlighting opportunities and boundaries for treating AI as a shared resource in collaborative work.

Summary

  • The paper introduces a real-time collaborative editor that integrates customizable AI agents into team-based writing workflows.
  • It employs a mixed-initiative design with manual triggers and comment-based outputs to ensure user control and transparency.
  • Empirical results reveal user territoriality, moderate usability scores, and critical insights for future socio-technical tool integration.

Collaborative Document Editing with Multiple Users and AI Agents: An Expert Analysis

Introduction and Motivation

The paper "Collaborative Document Editing with Multiple Users and AI Agents" (2509.11826) addresses a critical gap in the design of AI-assisted writing tools: the lack of support for multi-user, team-based collaboration with AI agents. While LLM-powered writing assistants have proliferated, their architectures and interfaces are almost exclusively tailored for individual users, creating friction when teams attempt to integrate AI into shared writing workflows. This work proposes and empirically investigates a system that embeds customizable, shared AI agents directly into a collaborative document editor, enabling transparent, team-oriented AI interaction.

System Design and Architecture

The prototype system is a real-time collaborative text editor with integrated AI agents, built on a React/Quill frontend and a FastAPI/Celery/MongoDB backend. Real-time synchronization is achieved via Yjs CRDTs, supporting concurrent multi-user editing. The AI integration leverages OpenAI's GPT-4o-mini via both direct prompt APIs and the autogen library for multi-turn, agent-based interactions.

Key architectural features include:

  • Agent Profiles: Users can create multiple AI agents, each defined by structured (CV-like) and unstructured (notes) attributes. Profiles are editable by all but, in practice, are treated as personal territory.
  • Task List: Explicit, repeatable tasks can be assigned to agents, with manual or autonomous triggers (e.g., on save, after inactivity, or after collaborative edits).
  • Comment Integration: AI agent responses are surfaced as comments, never as direct edits, preserving established collaborative writing norms and providing granular user control.
  • Manual and Autonomous Initiative: Tasks can be triggered manually (by user action) or autonomously (by system-detected events), with a strong user preference for manual control. Figure 1

    Figure 1: The UI for agent profiles and creation, supporting both structured (CV) and unstructured (notes) agent definitions, with AI-powered suggestions and task history.

Empirical Study: Methodology and Results

A week-long field paper with 30 participants in 14 teams evaluated the system in authentic collaborative writing tasks. Data sources included interaction logs, questionnaires (SUS, CSI), and semi-structured group interviews.

Usability and Adoption

  • SUS Score: Mean 66.92 (SD=12.96), slightly below the standard benchmark, indicating moderate usability.
  • CSI Score: Mean 67.21 (SD=15.60), with "Enjoyment" and "Collaboration" as the most valued factors.
  • General Feedback: Most users found the system natural and efficient, but 23% reported reduced sense of control over the text. Figure 2

    Figure 2: SUS Rating—distribution of usability perceptions across standard SUS items.

    Figure 3

    Figure 3: General feedback ratings—participants' agreement with statements on naturalness, efficiency, and control.

Agent and Task Usage Patterns

  • Agent Creation: 39 custom agents (mean 2.79/group), predominantly defined via the structured CV interface. Edits to agent profiles were almost exclusively by the creator, indicating strong territoriality.
  • Task Creation: 67 tasks (mean 5.15/group), mostly created and triggered by the agent's creator. Manual triggers were heavily preferred over autonomous ones.
  • Comment Usage: 468 comments, with 376 generated by agents. Human-AI comment threads were typically short, with limited multi-turn negotiation.

Collaboration Dynamics

  • Territoriality: Agent profiles were treated as personal territory; other team members rarely edited them.
  • Resource Sharing: Once created, agents and their outputs were widely used as shared resources, with team members freely invoking others' agents in comments and tasks.
  • Control and Trust: Manual invocation of agents (via comments or task list) was preferred, reflecting a desire for control and transparency. Autonomous agent actions were viewed with suspicion, especially when they generated out-of-focus or verbose suggestions. Figure 4

    Figure 4: Frequency ratings on collaboration—participants' reported use of various collaborative tools and features.

Theoretical and Practical Implications

Socio-Technical Integration

The paper reveals that integrating AI into collaborative writing is not merely a technical challenge but a socio-technical one. Teams appropriate AI agents according to established norms of authorship, control, and coordination. Agent profiles function as personal extensions, while agent outputs become common objects, aligning with theories of territoriality and material mediation in collaborative work.

Agent Quantity: Functional vs. Perspective Value

A key finding is the trade-off between the efficiency of a single, general-purpose agent and the perspective value of multiple, role-specific agents. Teams deliberated on this axis, with some preferring minimal setup and others investing in diverse agent personas to facilitate future prompting and feedback diversity. This deliberation is shaped by anticipated prompting needs and the desire for explicit, role-based agent behavior.

Control, Initiative, and Trust

Manual control over agent actions is paramount. Participants favored mechanisms that allowed them to specify when and where agents act, primarily via comment-based invocation. Autonomous agent actions, especially those that generated unsolicited or verbose suggestions, were less trusted and less frequently accepted. This underscores the importance of mixed-initiative design principles that foreground user agency and minimize cognitive overhead.

Design Recommendations

  • Agent Profile Ownership: Systems should support explicit ownership or visibility controls for agent profiles, reflecting users' territorial instincts.
  • Comment-Centric AI Integration: Surfacing AI output as comments, rather than direct edits, aligns with collaborative writing norms and supports granular user control.
  • Manual over Autonomous Triggers: Prioritize manual task triggers and provide clear feedback for autonomous actions to maintain user trust.
  • Role-Specific Agents as Prompting Primitives: Allow teams to define and reuse agent personas as prompting shortcuts, reducing cognitive load in complex collaborative tasks.
  • Focus-Aware Suggestions: Develop mechanisms for agents to adapt their initiative based on user and team focus, minimizing out-of-context interventions.

Limitations and Future Directions

The paper sample is skewed toward academic writers, limiting generalizability. The LLM prompting and agent behavior were not exhaustively optimized, and the paper duration may not capture long-term norm formation. Future work should explore:

  • Longitudinal Studies: To observe the evolution of team-AI conventions over extended periods.
  • Comparative Baselines: To quantify the impact of integrated AI agents versus traditional, individual-centric AI tools.
  • Dynamic Access Control: Building on recent work in multi-user agent memory and permissions (Rezazadeh et al., 23 May 2025), to support more nuanced sharing and privacy.
  • Integration with External Communication Tools: Bridging document-centric and chat-based collaboration for richer, context-aware AI support.

Conclusion

This work demonstrates that embedding customizable, shared AI agents into collaborative document editors is both feasible and impactful, but must be carefully attuned to the social dynamics of team writing. Agent profiles are appropriated as personal territory, while agent outputs become shared resources, and manual control is strongly preferred over autonomous agent initiative. These findings delineate both the opportunities and boundaries for treating AI as a shared resource in collaborative work, providing actionable guidance for the next generation of team-oriented AI writing tools.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Explain it Like I'm 14

Short overview

This paper is about making it easier for groups of people to write together with help from AI, without leaving their shared document. The authors built a prototype (a special version of a Google Docs–style editor) where “AI agents” live inside the document. These agents can be customized, given tasks, and they leave their suggestions as comments, so everyone can see and discuss them.

What questions did the researchers ask?

  • How do people in a team create and set up AI helpers (agents) when they’re writing together?
  • How do they actually work with those shared AI agents while co-writing?

How did they paper it?

The tool they built

Think of AI agents as smart, customizable helpers inside a shared doc. The tool has three big ideas:

  • Agent profiles: Each agent has a profile (like a mini CV) with a name, role, skills, and notes. This helps teams create different helpers, like a “Brainstormer,” a “Reviewer,” or an “English coach.”
  • Tasks: There’s a shared to-do list for the AI. A task is a clear instruction like “Suggest examples for section 2” or “Summarize this paragraph.” Tasks can be run by a person (manually) or automatically at certain times (for example, after the document is saved or when everyone logs off).
  • Comments: Agents never change the text directly. Instead, they answer inside comment threads. People can then choose to “Append” (add) or “Replace” text and must approve the change. This keeps humans in control and makes AI help visible to everyone.

In everyday terms: profiles = who the helper is, tasks = what the helper should do, comments = where the helper speaks up.

The paper with people

  • 30 participants worked in 14 teams for one week.
  • They used the prototype to work on a real or practice writing task.
  • The researchers collected:
    • Interaction logs (what features people used and when)
    • Surveys about usability and creativity support
    • Group interviews about what worked, what didn’t, and how teams felt using the agents

What did they find?

Here are the main takeaways, explained simply:

  • Teams saw agents as tools, not teammates. People treated AI like a shared resource they could use and direct—not like another person in the group.
  • Setting up agents felt personal; using them felt shared. The person who created an agent often edited its profile, like it was their “territory.” But once an agent existed, everyone used its output as a shared resource.
  • One helper vs. many helpers was a real choice. Some teams liked a single, general-purpose agent because it was faster and simpler. Others preferred multiple specialized agents (like “Grammar Guru,” “Structure Coach,” “Idea Generator”) to get diverse feedback—and to save prompting effort later.
  • Keeping humans in charge matters. Because AI only suggested changes in comments and never edited the document directly, teams stayed comfortable and in control. People could review, discuss, and approve changes.
  • The familiar comment system made AI help feel natural. Seeing AI replies in the same place as human comments fit well with how teams already work.
  • The tool was useful but still early. People generally felt it helped them work more efficiently and was enjoyable, but some still felt a bit out of control at times. Overall usability scores were near average for new tools.

A few light numbers to give you a sense of scale:

  • Teams created around 40 custom agents in total.
  • Most agent setups used the structured “CV” fields (role, skills) rather than free-form notes.
  • Teams created about 70 reusable tasks.
  • There were many comments overall; most were AI suggestions added by running tasks, and short human–AI exchanges were common.

Why does this matter?

This research shows a practical way to bring AI into the place where collaboration already happens: the shared document. It suggests some design rules for team writing with AI:

  • Make AI help public and reviewable. Put AI responses in comments, not direct edits, so teams can discuss and approve changes.
  • Let people customize helpers but keep them visible to everyone. Profiles feel personal, outputs are shared.
  • Support both quick help and deeper structure. Provide a default general agent plus ways to create specialized agents and repeatable tasks.
  • Offer both manual and automatic timing. Sometimes a human should trigger a task; other times it’s handy if AI runs after saves or when people are offline.

In the long run, tools like this could reduce the friction of switching between chatbots and documents, help teams coordinate who does what, and keep AI as a transparent, shared helper. It won’t replace teamwork—instead, it can make teamwork smoother by giving everyone clear, controllable AI support right where they work together.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The points below identify what remains missing, uncertain, or unexplored in the paper and prototype, framed to guide actionable future research.

  • External validity and sampling
    • Generalizability beyond small academic teams: participants were largely university-affiliated and frequent LLM users; effects in industry, non-academic, or low-LLM-literacy teams remain unknown.
    • Short deployment (one week): lacks evidence on long-term adoption, evolving practices, sustainability, and drift in team norms or agent usage.
    • Task heterogeneity: allowing “own task” vs. provided essay introduces confounds; controlled comparisons across task types and difficulty are missing.
  • Evaluation design and outcome measures
    • No baseline/control condition: lacks comparison to standard collaborative editors with external AI use (e.g., separate ChatGPT tab), making it unclear what the integrated design actually improves.
    • Absent quality metrics: no objective measures of writing quality, correctness, originality, readability, or audience impact; no expert blind ratings or rubric-based evaluation.
    • Unknown productivity effects: no time-on-task, throughput, or revision efficiency metrics (e.g., number of accepted suggestions per hour).
    • Acceptance and editing behavior: acceptance rates of AI suggestions, degree of human edits prior to approval, and predictors of acceptance (e.g., agent type, task type, location in text) were not analyzed.
    • Costs vs. benefits of autonomy: no quantitative assessment of comment/thread clutter, interruption cost, or cognitive load induced by autonomous tasks.
  • Feature-level uncertainties
    • Ownership and permissions for agents: while agent profiles were treated as personal territory, the system offers no explicit ownership, privacy, or permission model (e.g., private vs shared agents, locking, versioning). Effects of such controls on collaboration and conflict are unknown.
    • Agent multiplicity vs. single generalist: the trade-offs between one flexible agent and several role-specific agents are not systematically tested (e.g., setup cost, prompt reuse benefits, diversity of perspectives, error rates).
    • Trigger design for autonomous tasks: trigger choices (idle time, saves, “collaborative edits,” etc.) were elicited but not evaluated for appropriateness, false positives/negatives, or user trust. Optimal thresholds, personalization, and interruptibility policies remain open.
    • Comment-channel fit: agents only propose through comments, never direct edits. When (if ever) should direct edits be allowed, and which safeguards (e.g., gated auto-accept, constrained scopes) minimize social friction without adding overhead?
    • Ad-hoc vs. reusable tasks: relative value of comment-summoned ad-hoc tasks vs. curated task-list shortcuts is unclear; how to surface, organize, and retire task templates over time is unexplored.
    • Scrutability aids: the utility of agent task histories, rationales, and side-by-side diffs for trust and error detection was not measured; which transparency features truly help teams?
  • Model, safety, and robustness considerations
    • Model dependence: only one model (gpt-4o-mini) was used; effects of model quality, latency, cost, and provider policies on collaboration dynamics are unknown.
    • Prompt injection and adversarial text: multi-author documents are vulnerable to embedded instructions; no defenses, detection, or sanitization strategies were evaluated.
    • Hallucinations and bias: no measurement of factuality, citation integrity, bias amplification, or style homogenization; impact on group decision quality is unknown.
    • Auto-assignee reliability: the 85% confidence threshold for auto selection was not validated; misassignment rates and downstream effects are unreported.
    • Scalability and performance: behavior with large documents, many concurrent users, or many autonomous tasks (comment flooding, latency spikes, rate limits, cost control) was not tested.
  • Collaboration dynamics and social factors
    • Team roles and power dynamics: how agent-mediated suggestions interact with unequal expertise, seniority, or authorship credit remains unexamined.
    • Accountability and provenance: the system merges accepted AI text without persistent, fine-grained provenance; how to attribute contributions and ensure accountability over time is open.
    • Norms for AI involvement: guidelines for disclosure, crediting AI, and editorial responsibility in co-authorship are not addressed.
    • Cross-tool ecologies: many groups used external chat tools; how integrated agents should interoperate with chat, issue trackers, or reference managers remains unexplored.
  • Learnability and onboarding
    • Training needs: after early feedback, an intro session was added, suggesting nontrivial learning costs. Optimal onboarding flows and their impact on outcomes were not studied.
    • Novice vs. expert differences: the paper does not analyze how prior LLM experience modulates benefits, errors, or reliance patterns.
  • Domain and language coverage
    • Beyond academic prose: utility for journalism, policy writing, legal drafting, technical docs, grant proposals, or mixed media content (figures, tables) is unknown.
    • Multilingual and accessibility support: effectiveness for non-English text, code-switching, and accessibility needs (screen readers, keyboard-only workflows) is untested.
  • Governance, policy, and ethics
    • Data governance and privacy: risks of sending sensitive document content to external LLMs and remedies (on-prem models, redaction, DLP) are not addressed.
    • IP and licensing: implications for ownership of AI-generated text and compliance with publisher/organizational policies are not discussed.
    • Anthropomorphism and trust calibration: how agent persona cues influence over-trust, under-trust, or misuse remains an open design question.
  • Design space extensions and alternatives
    • Permissioned territories: mechanisms that align with perceived territoriality (e.g., private agents, team-owned agents, sub-team scoping) and their effects need paper.
    • Agent learning from team feedback: whether agents should adapt to team norms via ratings, corrections, or RLHF-like signals, and how to avoid unwanted drift.
    • Multi-agent coordination: systematic exploration of agent–agent “conversations,” conflict resolution policies, and orchestration strategies is missing.
    • Richer integrations: applying the paradigm to other collaborative media (slides, spreadsheets, code), citations management, or structured editing workflows is open.
  • Methodological improvements for future work
    • Controlled manipulations: systematically vary number of agents, autonomy levels, trigger policies, and permission models to identify causal effects.
    • Mixed-methods rigor: complement qualitative insights with inferential statistics on behavior logs and pre-registered hypotheses.
    • Release artifacts: code, prompts, and datasets were not indicated as shared; lack of artifacts limits replication and extension.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 posts and received 167 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com