PlanPers: Personalized Plan-RAG Framework
- The paper introduces PlanPers, which integrates planning modules and personalized signals into RAG systems to achieve up to 15% performance gains on key benchmarks.
- PlanPers is a modular architecture that formalizes personalization as a control problem, using structured planning and contrastive retrieval to tailor responses based on user-specific features.
- The system employs a multi-stage pipeline—personalized query rewriting, adaptive retrieval, and action plan generation—resulting in enhanced metrics like ROUGE, BLEU, and improved stylistic fidelity.
Plan-RAG Personalization (PlanPers) is a class of architectures that extend retrieval-augmented generation (RAG) systems with explicit personalization through planning modules, fine-grained user or author features, and contrastive or adaptive retrieval protocols. PlanPers frameworks are characterized by their integration of structured, user- or context-tailored planning and retrieval strategies into the RAG pipeline. The goal is to enhance the semantic alignment, stylistic fidelity, and goal-orientation of generated responses or actions, leveraging diverse personalization signals and advanced pipeline control.
1. Formal Definitions and Mathematical Foundations
PlanPers formalizes personalization as a control problem over the RAG pipeline in which user-specific signals and an explicit planning module guide both retrieval and generation. Let denote the user query, the user (or author) profile, an external corpus, and a dynamic user memory. Model parameters are grouped as (planner), (retriever), and (generator). Four core stages are:
- Pre-retrieval: Personalized query rewriting,
- Retrieval: Conditional retrieval,
- Planning: Generation of action plans, with
- Generation: Response synthesis,
The composite objective incorporates task success and personalization alignment: where is the ground-truth response. In reinforcement settings, rewards are used to update planning/generation policies by policy gradient (Li et al., 14 Apr 2025, Yazan et al., 24 Mar 2025).
2. Pipeline Architectures and Algorithmic Modules
PlanPers extends canonical RAG by sandwiching a planner between retrieval and generation. This planner outputs a structured plan (e.g., sub-queries, tool/API calls, memory operations) reflecting personal context.
Generic PlanPers Pipeline:
- Feature extraction and context summarization (user/author features, historical traits)
- Personalized or adaptive retrieval, including contrastive selection (e.g., hard negatives from out-of-profile sources)
- Action plan generation via neural planner or sequence generator
- Per-action retrieval/generation as dictated by
- Final integration and output synthesis
A typical PlanPers pseudocode block in author/persona modeling (Yazan et al., 24 Mar 2025, Li et al., 14 Apr 2025) is:
1 2 3 4 5 6 7 |
Input: query q, profile p, profile history P_p, global profile P_all Output: personalized generation ŷ 1. Extract user-specific features (sentiment, word frequency, syntax patterns) 2. Retrieve top-K from P_p based on q 3. Sample hard-negative documents from globally least-similar profiles 4. Form enriched prompt with retrieved contexts and user feature vectors 5. Generate response ŷ with LLM using the personalized context |
In dialogue or multi-source settings, PlanPers can be realized as a unified sequence-to-sequence system with special tokens (“acting tokens” for source/tool selection, “evaluation tokens” for relevance scoring) (Wang et al., 2024). Each phase—planning, retrieval, generation—is cast as token prediction within a single autoregressive paradigm.
3. Personalization Mechanisms and Signal Integration
Personalization in PlanPers operates at several abstraction levels:
- Explicit Feature Injection: Features such as sentiment polarity , word-frequency vectors , and dependency-pattern counts are projected via learned matrices into the LLM’s embedding space, concatenated to the context, and made available to the generator (Yazan et al., 24 Mar 2025).
- Contrastive Examples: Negative or contrastive samples from dissimilar profiles/authors are incorporated, refining the system’s discrimination between target and generic traits. Scoring functions penalize redundancy with sampled contrastive negatives:
Adaptive retriever training can include a contrastive loss, pushing embeddings of personal content close and negatives apart (Yazan et al., 24 Mar 2025).
- Planner-Guided Personalization: An LLM-based (or similar) planner proposes a high-level sequence of actions, including staged retrievals keyed to personal memory “slots,” API invocations, or evidence fusion steps (Li et al., 14 Apr 2025).
- Personalized Query Expansion: Systems such as PBR layer user style and semantic structure signals directly into the embedding used for retrieval. Components include Pseudo-Relevance Feedback (P-PRF) for style, and P-Anchor for graph-based structural alignment, culminating in a fused personalized query representation (Zhang et al., 10 Oct 2025).
4. Evaluation Protocols and Experimental Evidence
Benchmarks cover domains including news, scholarly abstracts, tweet paraphrasing (LaMP-4/5/7), synthetic dialogue (PersonaBench), medical planning (MedPlan), and e-commerce. Key evaluation metrics include:
| Metric | Definition (where applicable) |
|---|---|
| ROUGE-1, ROUGE-L | Lexical n-gram overlap (information preservation, fluency) |
| Personalization Accuracy | Task-specific measure; e.g., frequency of personalized entities/style in output |
| BLEU, METEOR, BERTScore | Lexical and semantic overlap (medical, dialogue) |
| Planning Success Rate (PSR) | Fraction achieving user goals: |
| Preference Alignment Score | Average cosine similarity of user profile–output embeddings |
Key outcomes presented in (Yazan et al., 24 Mar 2025):
- PlanPers (WF + CE) attains ROUGE-L of 0.210 on LaMP-4, a 7.1% increase over baseline RAG.
- On LaMP-7, PlanPers (DPF + CE) yields a 16.1% relative gain in ROUGE-L over RAG.
- Ablations confirm the importance of both author features (e.g., dependency patterns) and contrastive examples, which contribute cumulative improvements of up to 15% relative.
Qualitative findings include enhanced alignment with idiosyncratic entity naming, syntactic variation, and domain-specific term use.
5. Exemplars and Domain-Specific Variations
Medical Plan-RAG (MedPlan) (Hsu et al., 23 Mar 2025):
Implements a two-stage personalized pipeline:
- Assessment generation (S+O → A) via retrieval-augmented LLMs using both cross-patient and self-history.
- Plan generation (S+O+A → P) further personalized by incorporating both current and historical patient context.
- Retrieval employs bi-encoder semantic search followed by cross-encoder re-ranking; LLM attention mechanisms process structured longitudinal data.
- Evaluation on >350,000 notes demonstrates superior performance in BLEU, METEOR, ROUGE, and BERTScore, with a 66% higher clinical appropriateness rating over direct generation baselines.
Dialogue and Multi-Source Planning (UniMS-RAG) (Wang et al., 2024):
All planning, retrieval, and generation functions are integrated via sequence-to-sequence modeling with “acting tokens” for knowledge source selection and “evaluation tokens” for adaptive, per-evidence scoring. Self-refinement iterations optimize evidence coherence and persona consistency. Experimental evaluation achieves superior BLEU/ROUGE/persona consistency over domain and persona selection baselines.
Personalized Query Expansion (PBR) (Zhang et al., 10 Oct 2025):
User-aware query expansion fuses pseudo-feedback on user style and reasoning paths with graph-based semantic anchors derived from user corpora. This integration boosts recall and ranking metrics by up to 10% relative to strong query expansion baselines.
6. Limitations, Challenges, and Future Directions
Several constraints shape current PlanPers architectures:
- Scalability and Efficiency: Multi-step planning and dynamic memory management remain computationally intensive, particularly for large dynamic user graphs or long-horizon tasks (Li et al., 14 Apr 2025).
- Cold-Start and Data Sparsity: Sparse or noisy user data impedes effective personalization and may erode retrieval performance (Zhang et al., 10 Oct 2025).
- Evaluation Gaps: Many benchmarks treat single-domain or one-shot scenarios, lacking direct measures for user satisfaction or longitudinal personalization quality.
- Hallucination and Drift: Erroneous plan steps may induce retrieval drift or incoherence in downstream generation.
Active research directions include:
- Lightweight adaptation: Meta-planning with adapters for resource-constrained or on-device scenarios.
- Meta-reasoning: Planners that dynamically arbitrate between memory recall and tool invocation using uncertainty or context.
- Advanced feature projection: Learning richer, continuous encodings for user attributes (e.g., NumeroLogic-inspired) (Yazan et al., 24 Mar 2025).
- Multi-modal extension: Extending plan-personalized retrieval and execution to visual and audio modalities.
- Privacy preservation: Federated learning for on-device personalization, minimizing centralized data exposure (Li et al., 14 Apr 2025).
A plausible implication is that further integration of reinforcement learning, meta-learning for contrastive selection, and robust multi-modal orchestration will generalize the PlanPers approach to a broader set of user-adaptive, knowledge-intensive applications, including complex dialogue systems, personalized QA, and agent-based planning.
7. Synthesis and Outlook
Plan-RAG Personalization (PlanPers) unifies fine-grained user modeling, dynamic planning, contrastive retrieval, and flexible action sequencing within the RAG pipeline. Across domains, PlanPers demonstrates significant quantitative and qualitative improvements—up to 15% relative gain—over baseline RAG in capturing and deploying idiosyncratic user traits without the need for per-user parameter updates or model retraining (Yazan et al., 24 Mar 2025, Li et al., 14 Apr 2025). PlanPers architectures combine explicit feature engineering, neural planning modules, and integrated contrastive strategies, setting a foundation for the next generation of user-adaptive retrieval-augmented generation systems.