Personalization of LLMs
- Personalization of LLMs is the process of adapting general models to align with individual user preferences, behaviors, and context.
- Key techniques include retrieval-augmented prompting, parameter-efficient fine-tuning, and reward-based learning to steer outputs.
- Recent advances emphasize scalability, privacy preservation, and plug-and-play transferability across applications like recommendation and multimodal tasks.
Personalization of LLMs
Personalization of LLMs refers to the suite of methods, architectures, and evaluation principles designed to adapt otherwise general-purpose LLMs to align outputs with the unique preferences, behaviors, context, or goals of individual users or user cohorts. Tailored LLMs have a broad range of applications in recommendation, conversational agents, education, content generation, and multimodal assistants. Approaches encompass retrieval-augmented prompting, natural-language preference inference, parameter-efficient fine-tuning, reward learning, and embedding-based modulation. Core desiderata for modern personalization include scalability, data- and compute-efficiency, privacy preservation, plug-and-play transferability, and interpretability of personalization mechanisms.
1. Formal Frameworks and Taxonomies
The modern formalization of LLM personalization is characterized by explicit conditioning of model outputs on user-specific data. For a pretrained LLM with parameters , standard inference computes for input . A personalized LLM injects user-specific data —possibly including authored texts , static attributes , and behavioral data —yielding with 0 potentially being a full or partial adaptation of the base parameters (Zhang et al., 2024). Adaptation is accomplished by prompt-based augmentation, retrieval, adapter injection, parameter-efficient fine-tuning (PEFT), preference summarization, or user-conditioned reward optimization. Taxonomically, personalization spans:
- Direct Personalized Generation: Models output text or multimodal responses closely matching user ground truth (e.g., personalized subject lines, image generation).
- Downstream Application Personalization: Model output or latent representation is further employed for user-centered tasks (e.g., recommendations, retrieval, classification).
Granularity varies from per-user adaptation to clustered persona-level modeling and global preference interpolation (Zhang et al., 2024).
The key mathematical structures include:
1
2
3
where 4 is a retriever, 5 a prompt constructor, and 6 the retrieved user profile entries (Zhang et al., 2024, Salemi et al., 2023).
- PEFT/Adapter-based approaches: personalize only a subset 7 of parameters, typically via LoRA/prefix or similar mechanisms (Tan et al., 2024, Li et al., 25 Nov 2025).
- Reward Learning/Preference Learning: Learn user-specific reward or preference representations 8 or compact summaries 9 to steer generation in RL or DPO frameworks (Shenfeld et al., 8 Mar 2025, Li et al., 2024, Chen et al., 17 Oct 2025).
2. Core Techniques and Algorithmic Advances
2.1 Retrieval-Augmented and Prompt-Based Methods
Prompt-based personalization incorporates user history or profile entries into the input context (Salemi et al., 2023, Zhang et al., 2024). Retrieval-augmented generation (RAG) uses traditional (BM25, TF-IDF) or neural (e.g., Contriever) retrievers to select personal past entries most relevant to the new query, to be concatenated to the prompt (Salemi et al., 2023, Richardson et al., 2023, Wu et al., 2024).
- Personalized outputs are observed to benefit primarily from personalized responses (outputs) and not merely from semantic similarity with past queries (Wu et al., 2024). Placing retrieved outputs earlier in the prompt increases their influence on model generation.
- Summary-augmented approaches generate task-aware, offline user summaries with an LLM, then inject only a few retrieved items at inference time to maximize personalization under token budget constraints (Richardson et al., 2023).
Prompt or summary-based personalization is parameter-free at deployment, but is limited by context window length, potential dilution of user signals, and failing to capture higher-order or evolving personal behaviors.
2.2 Parameter-Efficient Fine-Tuning (PEFT) and Adapter Methods
PEFT strategies adapt only small, user-specific parameter increments—such as LoRA modules, prefix-tuning vectors, or IA³ adapters—leaving the base LLM frozen (Tan et al., 2024, Li et al., 25 Nov 2025, Tan et al., 18 Oct 2025).
- OPPU (One PEFT per User): Each user 0 is assigned a personalized parameter module 1, optimized for cross-entropy over user history. Non-parametric augmentation (retrieval, profiles) can be combined with these modules for additional gains (Tan et al., 2024).
- MTA (Merge-then-Adapt): A meta-bank of anchor LoRAs is constructed by clustering users, pretraining anchor modules, then dynamically fusing top-k anchors weighted by embedding similarity to form each user's temporary parameterization. A further ultra-low-rank LoRA is stacked for rapid, few-shot adaptation, supporting sublinear storage scaling and robust few-shot adaptation (Li et al., 25 Nov 2025).
- Profile-to-PEFT Hypernetworks: A single hypernetwork is trained to generate personalized PEFT adapter parameters directly from user profile encodings, enabling instant adaptation to unseen users, strong generalization, and privacy-preserving local deployment (Tan et al., 18 Oct 2025).
2.3 Reward Modeling, RLHF, and Factorized/User-Conditioned Rewards
Vanilla RLHF assumes an undifferentiated reward function over all users, often leading to majority-averaged behavior (Li et al., 2024). Personalized reward learning addresses this:
- Personalized RLHF (P-RLHF): Jointly learns user embeddings 2 and reward/policy models, injecting 3 as soft prompts or direct conditioning (Li et al., 2024).
- Reward Factorization (PReF): Models each user's reward as 4, with a low-dimensional base reward function set. Only a small number of (510) user-provided preference queries suffices to fit 6, supporting data-efficient and scalable deployment (Shenfeld et al., 8 Mar 2025).
- Natural Language Preference Summaries (POPI, AlignXplore+): Models distill user preference signals into interpretable, text-based summaries 7 via joint RL/supervised training. Summaries serve as compact, universal personalization instructions, plug-compatible with arbitrary downstream LLMs without parameter updates (Liu et al., 8 Jan 2026, Chen et al., 17 Oct 2025).
2.4 Embedding- and Representation-Based Methods
- Embedding-to-Prefix (E2P): User embeddings, learned from behaviors or external systems, are projected via a lightweight MLP to soft prefix tokens and injected into the input of a frozen LLM, modulating outputs with minimal computational overhead (Huber et al., 16 May 2025).
- Representation Editing (CHAMELEON): Synthetic preference data are generated using a frozen instruction-tuned LLM, and user-specific latent directions in network activations are identified. At inference, representations are edited by projecting out non-personalized components while amplifying personalized ones, requiring no additional parameter storage or per-user gradient updates (Zhang et al., 2 Mar 2025).
2.5 On-Device/Federated Personalization
Edge-device deployment mandates privacy and resource-awareness. Self-supervised frameworks select a small, diverse buffer of dialogues, occasionally querying the user for preferred responses, and fine-tune adapters (e.g., LoRA) solely on-device. Synthetic data augmentation expands training data, and continual adaptation is performed with strict resource and privacy constraints (Qin et al., 2023).
2.6 Multimodal and Retrieval-Augmented Personalization
Personalization of multimodal LLMs (text, vision, audio) leverages external key–value stores indexed by user-specific concepts (avatars, descriptions), region-level visual retrieval for input queries, and joint integration with textual prompts (Hao et al., 2024). Systems such as RAP and PMG demonstrate retrieval-augmented and hybrid (keyword, embedding) conditioning in MLLMs for personalized image captioning and multimodal content generation (Hao et al., 2024, Shen et al., 2024).
3. Benchmarks, Datasets, and Metrics
Evaluation of personalization approaches is standardized around multi-task, multi-user benchmarks. The LaMP benchmark contains diverse tasks (classification and generation), explicit user profiles with hundreds of entries per user, and supports both user-based and time-based splits (Salemi et al., 2023). Datasets for recommendation, personalized summarization, dialogue (Persona-Chat, ConvAI2), and long-form generation (LongLaMP) support broader evaluation (Zhang et al., 2024).
Metrics:
- Intrinsic: Macro F1, accuracy, ROUGE, BLEU, MAE/RMSE.
- Extrinsic: Recall@k, NDCG for recommendation.
- LLM-as-Judge: Automated, reference-free judges examine personalization degree, faithfulness, and user satisfaction (Chen et al., 17 Oct 2025, Liu et al., 8 Jan 2026).
Ablations systematically compare retrieval-only, profile-summary, PEFT-only, and hybrid variants (Tan et al., 2024, Richardson et al., 2023).
4. Privacy, Scalability, and Resource Considerations
- Privacy: Techniques minimizing storage or movement of raw user data are favored. Adapter-based methods (e.g., OPPU, MTA) require only per-user parameter sets, and hypernetwork approaches compute personalized modules locally at inference (Tan et al., 2024, Li et al., 25 Nov 2025, Tan et al., 18 Oct 2025). On-device learning locks user data to edge buffers (Qin et al., 2023).
- Scalability: Strategies that avoid 8 storage (where 9 = users) are imperative. MTA’s meta-bank decouples storage from the number of users, and P2P/POPI hypernetwork/summary-inference accommodate streaming new users at 0 additional compute (Li et al., 25 Nov 2025, Tan et al., 18 Oct 2025, Chen et al., 17 Oct 2025).
- Latency: Methods such as embedding-to-prefix and representation editing (e.g., CHAMELEON) operate at near-constant inference latency, critical for real-time experience (Huber et al., 16 May 2025, Zhang et al., 2 Mar 2025).
5. Interpretability, Transferability, and Multimodal Personalization
- Natural Language Preference Summaries: Text summaries as a universal personalization interface permit interpretability, user audit, plug-and-play transfer across LLMs or modalities, and facilitate conformal adaptation in previously unseen domains (Liu et al., 8 Jan 2026, Chen et al., 17 Oct 2025).
- Reward/Preference Transfer: Factoring preferences into low-dimensional subspaces or natural language enables few-query estimation and generalization across domains, tasks, or model families (Liu et al., 8 Jan 2026, Shenfeld et al., 8 Mar 2025).
- Multimodal Agents: Retrieval-augmented frameworks (RAP, PMG) support real-time user-driven updates and personalization at the concept and visual region level, demonstrating strong performance in multimodal captioning, visual QA, and recommendation (Hao et al., 2024, Shen et al., 2024).
6. Limitations, Challenges, and Future Directions
- Open Challenges:
- Faithful Personalization Metrics: No universal quantitative metric for personalized alignment; growing reliance on LLM-judge and human-in-the-loop evaluations (Zhang et al., 2024).
- Cold-Start: Sparse data for new users remains difficult; persona-level interpolation and summary-only initialization offer partial mitigation (Richardson et al., 2023, Zhang et al., 2024).
- Bias and Fairness: Personalized models may exacerbate stereotypes or echo chambers; fairness constraints and bias-aware evaluation are called for (Zhang et al., 2024).
- Continual and Multitask Personalization: Extending approaches to dynamic, cross-domain, and multi-task personalization, while supporting lifelong user model updates (Tan et al., 2024, Li et al., 25 Nov 2025, Tan et al., 18 Oct 2025).
- Multimodal User Representations: Integrating heterogeneous signals (text, image, audio) for unified personalization remains a technical frontier (Zhang et al., 2024, Shen et al., 2024).
- Practical Implications:
Plug-and-play summaries, adapter/hypernetwork personalization, and on-device training enable practical, privacy-sensitive, and scalable GPT-class deployments. Natural language and reward-based personalization yield interpretable signals amenable to user audit and cross-system transfer.
7. Summary Table: Method Classes and Key Properties
| Approach | Storage Scaling | Personalization Signal | Transferability |
|---|---|---|---|
| Prompt/Retrieval-Augm. | 1 | Retrieved user history | Moderate |
| PEFT (OPPU, LoRA) | 2 | Parametric, per user | Low |
| MTA (Merge-then-Adapt) | 3 | Merged and adapted LoRA | High |
| Hypernetwork (P2P) | 4 | Profile encoding to adapter | High |
| Natural-Language Summary | 5 | Preference text summary | High |
| Reward Factorization | 6 | User weight vector on base rewards | High |
Parameters: 7 = number of users, 8 = meta-LoRA bank size, 9 = relative size of ultra-light adaptation, 0 = reward base dimensions.
Leading research in LLM personalization demonstrates remarkable improvements in alignment, efficiency, and flexibility across core NLP and multimodal tasks, fueled by advances in modular parameter adaptation, low-dimensional preference summarization, and privacy-conscious design. Persistent open questions center on robust long-horizon personalization, rigorous and fair evaluation, and broad adaptation across user populations and data modalities (Zhang et al., 2024, Chen et al., 2023).