PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory

Published 9 Apr 2026 in cs.AI, cs.CL, cs.CV, cs.HC, and cs.MA | (2604.08000v1)

Abstract: Proactivity is a core expectation for AGI. Prior work remains largely confined to laboratory settings, leaving a clear gap in real-world proactive agent: depth, complexity, ambiguity, precision and real-time constraints. We study this setting, where useful intervention requires inferring latent needs from ongoing context and grounding actions in evolving user memory under latency and long-horizon constraints. We first propose DD-MM-PAS (Demand Detection, Memory Modeling, Proactive Agent System) as a general paradigm for streaming proactive AI agent. We instantiate this paradigm in Pask, with streaming IntentFlow model for DD, a hybrid memory (workspace, user, global) for long-term MM, PAS infra framework and introduce how these components form a closed loop. We also introduce LatentNeeds-Bench, a real-world benchmark built from user-consented data and refined through thousands of rounds of human editing. Experiments show that IntentFlow matches leading Gemini3-Flash models under latency constraints, while identifying deeper user intent.

Abstract PDF Upgrade to Chat

Authors (13)

Summary

The paper presents a novel DD-MM-PAS framework that integrates demand detection, hierarchical long-term memory, and proactive execution to continuously infer latent user needs.
The methodology leverages a dual-model streaming architecture (IntentFlow) with SFT and RL alignment, achieving an average intent prediction accuracy of 84.2% across diverse domains.
Empirical results and user studies demonstrate robust multi-turn performance, low latency, and strong generalization in real-world scenarios requiring proactive intervention.

Proactive Intent-Aware Agents with Long-Term Memory: A Technical Analysis of "PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory"

Motivation and Problem Formulation

PASK addresses a core unsolved problem in agentic AI: real-world agents that can proactively reason about latent user needs, respond in real time, and adapt over long horizons using persistent memory. Existing LLM-driven assistants remain predominantly passive—reliant on explicit user interaction and limited prompt-based state without person-level history. This paradigm fails in scenarios requiring timely, context-sensitive intervention, or deep user modeling. PASK operationalizes proactivity as continuous demand inference, precise memory-based adaptation, and robust always-on deployment. The interaction objective is formalized as maximizing the expected utility of interventions, balancing helpfulness against intrusiveness under evolving uncertainty and latency constraints.

The DD-MM-PAS Paradigm: Unifying Proactivity, Memory, and Execution

The core architectural contribution is the DD-MM-PAS framework, which integrates three essential modules:

Demand Detection (DD): Acts as a streaming engine continuously predicting latent user needs from multimodal context, enabling preemptive system-initiated assistance.
Memory Module (MM): Introduces a three-tier hierarchical memory system (user/workspace/global) supporting persistent, person-level user model accumulation and retrieval at variable temporal scales.
Proactive Agent System (PAS): Orchestrates end-to-end closed-loop operation—fusing real-time perception, concurrent task execution, and memory management.

This design is instantiated in PASK, which shifts the agent paradigm from isolated, reactive dialogue processors to "active initiators" capable of multi-domain proactive personalization—spanning professional, academic, and daily scenarios.

Figure 1: The DD-MM-PAS paradigm, showing its integrated demand detection, long-term memory, and always-on proactive agent execution loop.

IntentFlow: Streaming Demand Detection at Scale

IntentFlow, the specialized DD backbone, uses a dual-model streaming architecture—composed of a high-capacity Demand Detector (Qwen3-30B-A3B-Instruct for primary intent inference) and an auxiliary MemLoader (Qwen3-4B-Instruct for memory curation). At each interaction step, IntentFlow predicts one of three control signals: <silent>, <fast_intervention>, or <full_assistance>, thereby flexibly balancing non-intrusive silence, low-latency clarification, and context/memory-dependent deep intervention.

Figure 2: Architecture of IntentFlow, which processes streaming input fragments and decides per turn on whether and how to intervene, leveraging both workspace and external memory.

Control tokens determine the intervention path: <silent> suppresses unnecessary output; <fast_intervention> provides immediate local context assistance; <full_assistance> invokes the memory pipeline for deeper, multi-turn or contextually opaque inferences.

Figure 3: Intervention control by IntentFlow—examples of different system modes triggered by DD outcomes.

Data and Training Pipeline

IntentFlow is trained on LatentNeeds, a large-scale intent-annotated dataset (100k synthetic and curated, 2.1k real-world sessions). The training pipeline features SFT on LatentNeeds-100k and RL alignment on human-edited sessions, optimizing both immediate prediction and nuanced alignment with actual user demands. An LLM-as-a-judge protocol quantifies correctness along axes of reference congruence, contextual necessity, and plausibility.

Figure 4: LatentNeeds dataset pipeline and training workflow—unifying synthetic and real-world session data for robust intent annotation and model alignment.

Hierarchical, Self-Evolving Memory: Pask-MM

Pask-MM models memory as a bounded, hierarchical structure with distinct user (cache), workspace (main memory), and global (external tree-structured storage) components:

User Memory ( $\mathcal{M}_{\text{user}}$ ): Compact, high-priority background profile for prompt-level personalization.
Workspace Memory ( $\mathcal{M}_{\text{wsp}}$ ): Low-latency, session-local state maintained throughout an active interaction.
Global Memory ( $\mathcal{M}_{\text{global}}$ ): Persistent semantic tree supporting scalable retrieval and knowledge consolidation via localized RAG queries.

Updates are performed asynchronously post-session to ensure inference-time efficiency. New data are lazily merged; conflicts resolved and profiles decayed using Bayesian updates; and tree growth is bounded by depth and child thresholds, avoiding compute explosion typical of naive accumulation.

Figure 5: Internal architecture of Pask-MM, emphasizing memory partition, access/update protocol, and asynchronous scalable maintenance.

System Layer: Robust, Real-Time Proactive Infrastructure

Pask-PAS implements an end-to-end, always-on system. The frontend layer captures continuous multimodal signals (vision, audio, text) from user devices. The backend ensures persistent hot-state and object-store management, scalable scheduling, and agent isolation. The AI backend pools state-of-the-art foundation models across modalities for perception, understanding, and tool-augmented proactive action.

Figure 6: PAS system architecture: integrating frontend sensing, data/AI backends, and feedback-driven orchestration for real-world deployment.

Empirical Results

LatentNeeds-Bench and Main Findings

Evaluation on LatentNeeds-Bench (3,936 annotated multi-turn segments across work, learning, and daily domains) highlights the systemic limitations of most LM baselines. Even with large, prompt-optimized models, demand prediction accuracy on latent intent remains bottlenecked (e.g., Gemini-2.5-Flash-Lite: 18.8%). Only advanced closed models (Gemini-3-Flash: 83.3%; GPT-5-Mini: 66.5%; GPT-5-Nano: 71.2%) approach useful performance. IntentFlow, using targeted SFT+RL, achieves the best balanced average accuracy (84.2%), outperforming Gemini-3-Flash by 3.4 points, with highly competitive Demand (83.1%) and No-Demand (85.2%) splits.

Cross-Domain and Demand-Type Analysis

Closed models still dominate knowledge-centric and high-value domains (work, learning), but IntentFlow demonstrates superior generalization in open-ended daily tasks and maintains strong parity in both requirement and insight demand types, indicating robust calibration across proactive behavior spectra.

Multi-Turn and Latency Robustness

In extended (up to 60 turns, ~30 minutes) sessions, IntentFlow exhibits minimal performance drift ( $<$ 5% degradation), contrasting with baseline models that show severe long-context regression (e.g., GPT-5-Mini: --19.0%; Gemini-3-Flash: --17.3%). Latency is consistently sub-1.5s per turn for IntentFlow—a critical advantage for real-time agentic deployment.

Figure 7: Multi-turn balanced accuracy: IntentFlow maintains $>$ 80% across all interaction depths, outperforming other baselines as session context accumulates.

Qualitative Case and User Study Analyses

Long-term memory impact is validated via user study and case-driven evaluation. Memory enables:

Persistent user background modeling (roles, routines).
Episodic fact recall for consistent assistance.
Preference tracking for personalization.

Learning scenarios elicit highest user satisfaction; daily tasks prove inherently more open-ended and challenging—implicating required advances in context synthesis and character-level adaptation.

Figure 8: Three functional long-term memory case studies and user study outcomes, highlighting the benefits and current limitations of persistent agent-side memory.

Theoretical and Practical Implications

PASK demonstrates that robust proactivity emerges only from the tight coupling of intent-aware demand detection, structured long-term memory, and real-time system infrastructure—not from raw model scaling alone. The observed domain gap implies the need for domain-adaptive, memory-influenced agent tuning. Novel RL-based alignment with multi-criteria reward functions substantially narrows the closed-vs-open source model performance gap. The DD-MM-PAS architecture provides a minimally sufficient—and extensible—template for AGI systems aiming for genuine human-level adaptation over time.

Key outstanding directions include:

Richer multimodal signal fusion for non-verbal/contextual inference.
More granular memory abstraction for fine-tuned, latent need detection.
Advanced conflict resolution and drift accommodation for lifelong deployment.

Conclusion

PASK constitutes a comprehensive, scalable recipe for deploying intent-aware, memory-augmented, proactive agentic AI under real-world latency and uncertainty constraints. By combining a unified framework, tailored streaming models, and long-term adaptive memory strategies, it pushes agent design beyond surface-level proactivity toward deep, contextually grounded, and persistent human-aligned assistance. While not closing the full gap to human predictive nuance, the framework, dataset, and empirical results set a new bar for research on agentic proactivity and long-horizon human–AI co-evolution.

(2604.08000)

Markdown Report Issue