Papers
Topics
Authors
Recent
Search
2000 character limit reached

AwareLLM: A Proactive Multimodal Ecosystem for Personalized Human-AI Collaboration to Enhance Productivity

Published 10 May 2026 in cs.HC | (2605.09625v1)

Abstract: Information workers' productivity is significantly influenced by their cognitive states and physiological responses. AI assistants such as ChatGPT, Copilot, and others have become integral components of knowledge-intensive workplaces. These AI assistants utilize pre-defined user preferences and chat interaction histories, thus confining themselves to reactive exchanges, lacking sufficient adaptability. Consequently, they fail to cater to individual user preferences and are unable to adapt to their psychophysiological states, diminishing potential productivity gains. To bridge this gap, we introduce AwareLLM, a novel multimodal framework that integrates egocentric vision, pupillometry, eye-gaze tracking, posture detection, heart activity, and the inferencing capabilities of LLMs to create a proactive and context-aware ecosystem. AwareLLM dynamically adapts to users' psychophysiological states while analyzing temporal patterns and behavioral tendencies to provide personalized and timely interventions. We evaluated AwareLLM through a user study with 20 participants, comparing it to a standard LLM assistant across multiple tasks. Our results show statistically significant improvements in task performance, along with reductions in cognitive fatigue and mental demand. Participants described AwareLLM's personalized interventions as timely and relevant, helping them boost their confidence and deepen engagement with their work. AwareLLM opens new avenues for Human-AI collaboration where technology adapts to our needs rather than us adhering to technological constraints.

Summary

  • The paper presents AwareLLM, a system that integrates multimodal biosignals and contextual data to proactively tailor interventions and enhance productivity.
  • It employs a dual-loop architecture that fuses real-time sensor data with LLM-mediated reasoning to reduce cognitive load and improve task performance.
  • Experimental results demonstrate significant improvements in workload, focus, and output quality across diverse domains.

AwareLLM: A Proactive Multimodal Ecosystem for Personalized Human-AI Collaboration to Enhance Productivity

Motivation and Problem Formulation

AwareLLM addresses the limitations inherent in contemporary LLM-driven AI assistants, which primarily operate on static user preferences and chat histories, resulting in non-adaptive, reactive support. The paper identifies key gaps: insufficient adaptation to individual psychophysiological states, inability to respond to dynamic stress levels, and lack of proactive focus management. Survey data from 72 information workers reveal that productivity is closely associated with real-time physical and cognitive states, yet most AI tools fail to account for these states, especially among experienced users. This motivates a shift toward an embodied, multimodal ecosystem for workplace productivity that incorporates biosignals, contextual awareness, and proactive interventions.

System Architecture and Multimodal Sensing

AwareLLM operationalizes three interconnected awareness layers through a suite of sensors: external webcam for posture, eye tracker for gaze and pupillometry, ECG belt for heart activity, and periodic desktop screenshots plus egocentric world views for environmental context. These streams are fused, temporally aligned, and fed into a data processing module that computes ergonomic posture scores, stress (HRV, HR, SDNN, RMSSD, pNN20/pNN50), cognitive load metrics (fixation, saccade, pupil diameter), and task context. Real-time baseline establishment for each physiological signal enables personalized state classification and adaptation. Notably, all raw data is processed in-memory with immediate deletion post-inference, addressing privacy constraints.

Contextual Reasoning: Dual-Loop Architecture

The architecture employs a dual-loop pipeline:

  • High-Frequency Loop: Samples posture, screenshots, and egocentric vision every 15s, aggregates into minute-level summaries. Enables rapid responsiveness to short-term digital and physical state changes.
  • Low-Frequency Loop: Aggregates three HF summaries and physiologic data over a 3-minute window, focusing on sustained stress, cognitive overload, and distraction patterns while smoothing transient spikes.

Structured JSON summaries, user preferences, few-shot exemplars, and workplace guidelines are passed to a lightweight LLM (gpt-4o-mini input), which generates interventions or task suggestions. This reasoning engine differentiates sources of stress (e.g., task vs. environmental), modulates tone, and manages policy constraints (e.g., do-not-disturb, debounce logic). The dual-loop strategy capitalizes on the distinct temporal characteristics of digital vs. physiological signals for robust, contextually optimized feedback.

Proactive Assistance: User-Focused and Task-Focused Interventions

AwareLLM’s proactive interventions are stratified based on urgency and channel:

  • User-Focused (System Notifications): Issued for critical well-being states (e.g., poor posture, sustained high stress, persistent distraction), delivered as native OS notifications ensuring immediate visibility.
  • Task-Focused (In-Chat Suggestions): Provided for situational, non-urgent guidance (e.g., code debugging, literature review structure), surfaced inside the chat interface for seamless integration.

Tone adaptation leverages psychophysiological state inference, dynamically calibrating messages to match the user’s affective state, enhancing engagement, receptivity, and trust. This separation minimizes disruption while maximizing the utility of interventions.

Experimental Evaluation and Numerical Results

A controlled user study (N=20) spanning literature review, web development, and data science tasks was conducted using a within-subjects design (counterbalanced control and treatment phases). Assessment metrics included NASA-TLX workload ratings, post-system questionnaires, intervention feedback, and expert output evaluation.

Key quantitative findings:

  • NASA-TLX (aggregated):
    • Mental Demand: ↓22.1% (p=0.003)
    • Temporal Demand: ↓25.7% (p=0.003)
    • Performance: ↑15.3% (p=0.029)
    • Effort: ↓18.5% (p=0.006)
    • Frustration: ↓17.5% (p=0.041)
  • Post-system questionnaire:
    • Focus maintenance: ↑(p=0.009)
    • Work quality: ↑(p=0.005)
    • Workflow personalization: ↑58% relative (p=0.004)
    • Overall productivity: ↑33% (p=0.007)
  • Expert evaluation:
    • Literature Review (Structure/Quality): Control=73.41, AwareLLM=86.79 (p=0.0002)
    • Web Development (Sub-task Success): Control=54.51, AwareLLM=82.83 (p=0.0002)
    • Data Science (Sub-task Success): Control=74.51, AwareLLM=86.83 (p=0.0003)

The magnitude and consistency of improvements across cognitive workload, task manageability, output quality, focus, and subjective satisfaction are statistically significant and robust across domains.

Implications and Design Considerations

The integration of biosignals and context into LLM-driven reasoning constitutes a foundational step toward embodied, adaptive AI ecosystems. The dual-mode intervention logic addresses the proactivity dilemma—balancing support with user agency and preventing intrusive disruption. The architecture is modular, enabling future ablation studies on the contribution of each sensing modality. Privacy is foregrounded with real-time processing and immediate deletion, but deployment in real-world contexts mandates additional safeguards (e.g., device-local computation, modular sensor selection).

Practically, AwareLLM represents a template for universal augmentation layers in digital workplaces, leveraging physio-adaptive interfaces to mitigate cognitive fatigue, minimize overload, and support individual work styles. Theoretically, this model advances the paradigm of collaborative AI, transitioning from static, prompt-driven interaction to dynamic, anticipatory partnerships mediated by rich, multimodal user state estimation.

Limitations and Prospective Research

Notable constraints include the laboratory setting and cross-sectional evaluation. Longitudinal studies are required to assess persistent adaptation, user trust, and skill acquisition. Sensor intrusiveness and ecological validity must be optimized, potentially leveraging consumer-grade wearables and integrated hardware. Detailed ablation analyses and customization for privacy-conscious environments are priorities. Ethical and privacy implications of continuous multimodal surveillance remain open challenges, demanding rigorous governance and transparency.

Conclusion

AwareLLM demonstrates significant advantages of proactive, multimodal, context-sensitive assistance in productivity-centric human-AI collaboration. Through layered sensing, real-time baseline-driven adaptation, and LLM-mediated contextual reasoning, the system robustly enhances task performance, reduces cognitive and temporal load, fosters engagement, and improves output quality. The findings validate the importance of embodied physiological context in adaptive interface design and establish a framework for future developments in collaborative, human-centered AI (2605.09625).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.