Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

User-LLM: Efficient LLM Contextualization with User Embeddings (2402.13598v2)

Published 21 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs have achieved remarkable success across various domains, but effectively incorporating complex and potentially noisy user timeline data into LLMs remains a challenge. Current approaches often involve translating user timelines into text descriptions before feeding them to LLMs, which can be inefficient and may not fully capture the nuances of user behavior. Inspired by how LLMs are effectively integrated with images through direct embeddings, we propose User-LLM, a novel framework that leverages user embeddings to directly contextualize LLMs with user history interactions. These embeddings, generated by a user encoder pretrained using self-supervised learning on diverse user interactions, capture latent user behaviors and interests as well as their evolution over time. We integrate these user embeddings with LLMs through cross-attention, enabling LLMs to dynamically adapt their responses based on the context of a user's past actions and preferences. Our approach achieves significant efficiency gains by representing user timelines directly as embeddings, leading to substantial inference speedups of up to 78.1X. Comprehensive experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate that User-LLM outperforms text-prompt-based contextualization on tasks requiring deep user understanding, with improvements of up to 16.33%, particularly excelling on long sequences that capture subtle shifts in user behavior. Furthermore, the incorporation of Perceiver layers streamlines the integration between user encoders and LLMs, yielding additional computational savings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  3. Text summarization using large language models: A comparative study of mpt-7b-instruct, falcon-7b-instruct, and openai chat-gpt models, 2023.
  4. Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020.
  5. Lightgcl: Simple yet effective graph contrastive learning for recommendation. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  6. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595, 2023a.
  7. Pali: A jointly-scaled multilingual language-image model. arXiv preprint arXiv:2209.06794, 2022a.
  8. Intent contrastive learning for sequential recommendation. In Laforest, F., Troncy, R., Simperl, E., Agarwal, D., Gionis, A., Herman, I., and Médini, L. (eds.), WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, pp. 2172–2182. ACM, 2022b. doi: 10.1145/3485447.3512090.
  9. Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307, 2023b.
  10. Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788, 2023.
  11. Learning cross-lingual sentence representations via a multi-task dual-encoder model. arXiv preprint arXiv:1810.12836, 2018.
  12. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  13. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys’16), pp.  191––198, 2016a.
  14. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, New York, NY, USA, 2016b.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  16. Longnet: Scaling transformers to 1,000,000,000 tokens. arXiv preprint arXiv:2307.02486, 2023.
  17. User embedding model for personalized language prompting. arXiv preprint arXiv:2401.04858, 2024.
  18. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  19. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945, 2023.
  20. Gemini Team Google. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  21. End-to-end retrieval in continuous space. arXiv preprint arXiv:1811.08008, 2018.
  22. Learning dense representations for entity retrieval. arXiv preprint arXiv:1909.10506, 2019.
  23. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15180–15190, 2023.
  24. Google, G. T. Gemini: A family of highly capable multimodal models, 2023.
  25. Mamba: Linear-time sequence modeling with selective state spaces, 2023.
  26. How to train your HIPPO: state space models with generalized orthogonal basis projections. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  27. Onellm: One framework to align all modalities with language. arXiv preprint arXiv:2312.03700, 2023.
  28. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), 2015. ISSN 2160-6455. doi: 10.1145/2827872.
  29. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web, 2016.
  30. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  31. Perceiver: General perception with iterative attention. In International conference on machine learning, pp. 4651–4664. PMLR, 2021.
  32. Genrec: Large language model for generative recommendation, 2023.
  33. End-to-end deep attentive personalized item retrieval for online content-sharing platforms. In Proceedings of The Web Conference 2020, pp.  2870–2877, 2020.
  34. Llm maybe longlm: Self-extend llm context window without tuning. arXiv preprint arXiv:2401.01325, 2024.
  35. Do llms understand user preferences? evaluating llms on user rating prediction, 2023.
  36. Scaling laws for neural language models, 2020.
  37. Large language models are zero-shot reasoners. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  22199–22213. Curran Associates, Inc., 2022.
  38. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021.
  39. Uctopic: Unsupervised contrastive learning for phrase representations and topic mining. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  6159–6169, 2022.
  40. Exploring the upper limits of text-based collaborative filtering using large language models: Discoveries and insights, 2023.
  41. Ring attention with blockwise transformers for near-infinite context. arXiv preprint arXiv:2310.01889, 2023a.
  42. Is chatgpt a good recommender? a preliminary study, 2023b.
  43. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023c.
  44. On learning to summarize with large language models as references, 2023d.
  45. Llm-rec: Personalized recommendation via prompting large language models, 2023.
  46. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  1930–1939, 2018.
  47. Anymal: An efficient and scalable any-modality augmented language model. arXiv preprint arXiv:2309.16058, 2023.
  48. Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467, 2023.
  49. Learning federated representations and recommendations with limited negatives. ArXiv e-prints, 2021.
  50. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  51. Generative sequential recommendation with gptrec, 2023.
  52. U-bert: Pre-training user representations for improved recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 2021.
  53. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  54. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  55. Length bias in encoder decoder models and a case for global conditioning. arXiv preprint arXiv:1606.03402, 2016.
  56. Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Zhu, W., Tao, D., Cheng, X., Cui, P., Rundensteiner, E. A., Carmel, D., He, Q., and Yu, J. X. (eds.), Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3-7, 2019, pp.  1441–1450. ACM, 2019. doi: 10.1145/3357384.3357895.
  57. Efficient transformers: A survey. ACM Comput. Surv., 55(6), dec 2022. ISSN 0360-0300. doi: 10.1145/3530811.
  58. Llama 2: Open foundation and fine-tuned chat models, 2023.
  59. Focused transformer: Contrastive training for context scaling. arXiv preprint arXiv:2307.03170, 2023.
  60. Dropoutnet: Addressing cold start in recommender systems. In NIPS, pp.  4957–4966, 2017.
  61. Augmenting language models with long-term memory. arXiv preprint arXiv:2306.07174, 2023.
  62. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  63. A survey on large language models for recommendation, 2023a.
  64. Next-gpt: Any-to-any multimodal llm. arXiv preprint arXiv:2309.05519, 2023b.
  65. Automated self-supervised learning for recommendation. In Ding, Y., Tang, J., Sequeda, J. F., Aroyo, L., Castillo, C., and Houben, G. (eds.), Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pp.  992–1002. ACM, 2023. doi: 10.1145/3543507.3583336.
  66. Contrastive learning for sequential recommendation. In 2022 IEEE 38th international conference on data engineering (ICDE), pp.  1259–1273. IEEE, 2022.
  67. Retrieval meets long context large language models. arXiv preprint arXiv:2310.03025, 2023a.
  68. Openp5: Benchmarking foundation models for recommendation. arXiv:2306.11134, 2023b.
  69. Personalized showcases: Generating multi-modal explanations for recommendations. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.  2251–2255, 2023.
  70. Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion Proceedings of the Web Conference 2020, pp. 441–447, 2020.
  71. Debiased contrastive learning for sequential recommendation. In Proceedings of the ACM Web Conference 2023, WWW ’23, pp. 1063–1073, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450394161. doi: 10.1145/3543507.3583361.
  72. Learning semantic textual similarity from conversations. ACL 2018, pp.  164, 2018.
  73. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM ’21, pp.  4321–4330, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384469. doi: 10.1145/3459637.3481952.
  74. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations, 2019.
  75. Coca: Contrastive captioners are image-text foundation models. arxiv 2022. arXiv preprint arXiv:2205.01917, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Lin Ning (9 papers)
  2. Luyang Liu (20 papers)
  3. Jiaxing Wu (6 papers)
  4. Neo Wu (5 papers)
  5. Devora Berlowitz (2 papers)
  6. Sushant Prakash (15 papers)
  7. Bradley Green (20 papers)
  8. Shawn O'Banion (8 papers)
  9. Jun Xie (66 papers)
Citations (13)

Summary

  • The paper introduces a two-phase framework that generates dense user embeddings from multimodal interactions to efficiently contextualize LLMs.
  • It employs cross-attention and soft prompting to integrate user-specific data while significantly reducing input lengths and computational costs.
  • Experimental evaluations on three datasets demonstrate improved performance and up to 78.1X FLOPs reduction compared to text-prompt methods.

The paper "User-LLM: Efficient LLM Contextualization with User Embeddings" (Ning et al., 21 Feb 2024 ) proposes a novel framework called User-LLM to address the challenge of effectively incorporating rich, complex, and potentially noisy user interaction data into LLMs for personalization. Traditional methods often rely on feeding raw user history as text prompts, which is computationally expensive, especially for long sequences, and can struggle with the structure and noise of real-world interaction data.

User-LLM tackles this by introducing a two-phase approach:

  1. User Embedding Generation: A dedicated Transformer-based encoder is pretrained in a self-supervised manner on diverse user interaction data (which can include multiple modalities like item name, rating, category). This encoder distills the user's historical behavior and preferences into dense, compressed user embeddings. The paper primarily uses an Autoregressive Transformer as the encoder, processing sequences of fused embeddings representing user activities. An alternative Dual Encoder architecture is also explored. The autoregressive encoder generates a sequence of user embeddings, one for each input event.
  2. LLM Contextualization: These generated user embeddings are integrated with an LLM to provide user-specific context. The primary integration mechanism explored is cross-attention, similar to models like Flamingo [alayrac2022flamingo], where the LLM's intermediate text representations attend to the user embeddings. An alternative soft-prompting approach is also investigated, where the user embeddings are prepended as soft tokens to the LLM input.

A key advantage highlighted by User-LLM is its efficiency compared to text-prompt-based methods. By condensing potentially very long user history sequences into a fixed number of dense embeddings (often one embedding per event, or even further compressed using Perceiver layers), the LLM's input sequence length remains significantly shorter. This drastically reduces the computational cost and memory requirements for LLMs, particularly for tasks involving extensive user history. The paper demonstrates substantial FLOPs reductions (up to 78.1X) compared to text prompting as the history length increases. Perceiver layers [jaegle2021perceiver] are incorporated to further compress user embeddings using a learnable latent query, improving efficiency and potentially handling noisy contexts.

The framework offers flexible training strategies:

  • Full: Finetune all components (user encoder, projection layers, LLM).
  • Enc: Finetune only the user encoder and projection layers, keeping the LLM frozen.
  • LoRA: Finetune the user encoder, projection layers, and LLM using LoRA for parameter efficiency.
  • Proj: Finetune only the projection layers, keeping the user encoder and LLM frozen.

The effectiveness of User-LLM is evaluated on three public datasets: MovieLens20M, Google Local Review, and Amazon Review, across various tasks: Next Item Prediction, Favorite Genre/Category Prediction (requiring deep user understanding), and Multimodal Review Generation.

Key findings from the experiments include:

  • Performance: User-LLM generally outperforms non-LLM baselines (Dual Encoder, Bert4Rec) on denser datasets like MovieLens and Google Local Review for next item prediction. While Bert4Rec performs well on the sparse Amazon dataset, User-LLM shows competitive results and excels in tasks requiring deeper user understanding or generation.
  • Long Context Handling: User-LLM significantly outperforms text-prompt-based LLM finetuning on tasks with long user history sequences, where text prompting becomes computationally prohibitive and performance degrades due to LLM limitations with long inputs [liu2023lost].
  • Efficiency: User-LLM requires fewer trainable parameters for competitive performance (e.g., the 'Enc' strategy performs well). Its ability to represent user history with fewer tokens than text prompts leads to significant inference efficiency gains. User-LLM with Perceiver can further reduce the number of user embedding tokens while maintaining performance.
  • Training Strategies: The 'Enc' strategy (frozen LLM, tune encoder/projection) effectively contextualizes the LLM and often outperforms text-prompt-based LoRA tuning, suggesting it leverages the LLM's pre-existing knowledge well without overfitting.
  • Ablation Studies: Pretraining the user encoder is shown to be crucial. Combining long-term user embeddings with short-term text prompts can yield better performance. Cross-attention generally performs better than soft-prompting for integrating user embeddings, especially for generation tasks. The Autoregressive encoder tends to outperform the Dual Encoder in this setup.

In practice, User-LLM provides a computationally efficient and effective way to personalize LLM responses by leveraging user interaction history. Its architecture allows for processing diverse, multimodal data and long sequences, making it suitable for applications like personalized recommendations, content generation, and user-aware chatbots in real-world systems where user history can be extensive and varied. The flexible training strategies offer options for balancing performance, computational cost, and the need to preserve the LLM's base capabilities.