Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Personalization of Large Language Models: A Survey (2411.00027v1)

Published 29 Oct 2024 in cs.CL
Personalization of Large Language Models: A Survey

Abstract: Personalization of LLMs has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we bridge the gap between these two separate main directions for the first time by introducing a taxonomy for personalized LLM usage and summarizing the key differences and challenges. We provide a formalization of the foundations of personalized LLMs that consolidates and expands notions of personalization of LLMs, defining and discussing novel facets of personalization, usage, and desiderata of personalized LLMs. We then unify the literature across these diverse fields and usage scenarios by proposing systematic taxonomies for the granularity of personalization, personalization techniques, datasets, evaluation methods, and applications of personalized LLMs. Finally, we highlight challenges and important open problems that remain to be addressed. By unifying and surveying recent research using the proposed taxonomies, we aim to provide a clear guide to the existing literature and different facets of personalization in LLMs, empowering both researchers and practitioners.

Overview of "Personalization of LLMs: A Survey"

The paper "Personalization of LLMs: A Survey" explores the burgeoning field of personalization within LLMs and aims to consolidate existing research while identifying areas for further exploration. This comprehensive survey articulates a critical synthesis of methods, challenges, and applications related to personalizing LLMs, seeking to enhance user interaction by aligning outputs with individual or group-specific preferences.

Main Contributions and Structure

The authors establish a unifying taxonomy to categorize personalization efforts for LLMs, distinguishing between direct personalized text generation and its application for downstream tasks, such as recommendations. A detailed portrayal is provided on how these lines of research, though typically segregated, share foundational principles and methodologies. This cross-disciplinary approach promotes a nuanced understanding of personalization that fosters enhanced collaboration across AI research communities.

Personalization Granularity

Personalization is dissected into three levels of granularity—user-level, persona-level, and global preferences—each offering unique benefits and challenges. User-level personalization involves crafting models that adapt to individual user preferences, offering highly tailored interactions. Persona-level personalization aggregates preferences across user groups sharing similar traits, providing scalable customization. Lastly, global preference alignment addresses universally accepted norms and biases. This tiered approach allows for adaptive strategies, balancing precision and scalability.

Techniques for Personalization

The authors categorize personalization approaches by the format in which user information is employed:

  1. Retrieval-Augmented Generation (RAG): This method integrates external knowledge to tailor model outputs. Sparse and dense retrieval techniques operationalize this approach by pulling relevant content based on user-specific contexts.
  2. Prompting: Crafting contextually rich prompts that incorporate user preferences enhances the model's response generation, supporting both direct and role-specific personalization.
  3. Representation Learning: This technique focuses on adjusting model parameters, either entirely or through parameter-efficient fine-tuning, to encapsulate user-specific behaviors.
  4. Reinforcement Learning from Human Feedback (RLHF): Using user feedback as reinforcement signals, RLHF aligns LLMs with personalized preferences, optimizing the model's utility for diverse user populations.

Evaluation and Datasets

The evaluation of personalized LLMs is bifurcated into intrinsic methods, which assess text generation quality directly, and extrinsic evaluations that rely on downstream task performance. A taxonomy of datasets is proposed, differentiating those containing user-authored texts, pivotal for direct personalization assessment, from datasets geared towards indirect LLM application evaluations.

Applications and Challenges

Personalized LLMs are applicable across domains such as education, healthcare, finance, and legal systems, each posing unique challenges and benefits. These models hold promise in enhancing decision-making, providing tailored advice, and improving user satisfaction through personalized interactions.

However, the paper identifies unresolved challenges, including:

  • Cold-Start Problem: Addressing scenarios with minimal user data.
  • Bias Mitigation: Ensuring fair and unbiased outputs reflective of diverse perspectives.
  • Privacy: Balancing the enhancement of user experiences with the protection of personal data.
  • Benchmark Development: Creating robust outlines to reliably assess personalization's effectiveness.

Conclusion and Future Directions

The paper encapsulates the complexity and potential of personalizing LLMs, emphasizing the importance of interdisciplinary collaboration and the development of dynamic, adaptive systems. The field is positioned for substantial advancements through the exploration of hybrid strategies, enhanced data utilization, and the alignment of model capabilities with comprehensive ethical standards. The proposed frameworks and taxonomies present a foundation for future research aimed at refining the personalization landscape within LLMs, driving innovation towards socially responsible AI solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (339)
  1. Conversational health agents: A personalized llm-powered agent framework. arXiv preprint arXiv:2310.02374, 2023.
  2. Knowledge-infused llm-powered conversational health agent: A case study for diabetes patients. arXiv preprint arXiv:2402.10153, 2024.
  3. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  4. Using large language models to simulate multiple humans and replicate human subject studies. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  337–371. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/aher23a.html.
  5. Back to basics: Revisiting reinforce style optimization for learning from human feedback in llms. arXiv preprint arXiv:2402.14740, 2024.
  6. Amazon. How to use amazon rufus, 2024. URL https://www.aboutamazon.com/news/retail/how-to-use-amazon-rufus. Accessed: 2024-09-18.
  7. Foundational challenges in assuring alignment and safety of large language models. arXiv preprint arXiv:2404.09932, 2024.
  8. Dogu Araci. Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063, 2019.
  9. Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3):337–351, 2023. doi: 10.1017/pan.2023.2.
  10. AutoGPT. AutoGPT, 2024. URL https://github.com/Significant-Gravitas/AutoGPT. Accessed: 2024-09-18.
  11. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  12. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp.  65–72, 2005.
  13. Tallrec: An effective and efficient tuning framework to align large language model with recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems, pp.  1007–1014, 2023.
  14. The big five personality dimensions and job performance: a meta-analysis. Personnel psychology, 44(1):1–26, 1991.
  15. Llms instead of human judges? a large scale empirical study across 20 nlp evaluation tasks. arXiv preprint arXiv:2406.18403, 2024.
  16. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319–350, 2001.
  17. Ew-tune: A framework for privately fine-tuning large language models with differential privacy. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW), pp.  560–566. IEEE, 2022.
  18. Richard Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
  19. Multimodal llms for health grounded in individual-specific data. In Workshop on Machine Learning for Multimodal Healthcare Data, pp.  86–102. Springer, 2023.
  20. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp.  185–194, 2012.
  21. Can gpt-3 perform statutory reasoning? In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, ICAIL ’23, pp.  22–31, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701979. doi: 10.1145/3594536.3595163. URL https://doi.org/10.1145/3594536.3595163.
  22. Recommender systems survey. Knowledge-Based Systems, 46:109–132, 2013. ISSN 0950-7051. doi: https://doi.org/10.1016/j.knosys.2013.03.012. URL https://www.sciencedirect.com/science/article/pii/S0950705113001044.
  23. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  24. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3-4):324–345, 1952.
  25. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  26. Personalized document re-ranking based on bayesian probabilistic matrix factorization. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp.  835–838, 2014.
  27. Towards query log based personalization using topic models. In Proceedings of the 19th ACM international conference on Information and knowledge management, pp.  1849–1852, 2010.
  28. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
  29. Transfer q star: Principled decoding for llm alignment, 2024.
  30. Scaling synthetic data creation with 1,000,000,000 personas. arXiv preprint arXiv:2406.20094, 2024.
  31. Play guessing game with llm: Indirect jailbreak attack with implicit clues. arXiv preprint arXiv:2402.09091, 2024.
  32. Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419, 2023.
  33. Roleinteract: Evaluating the social interaction of role-playing agents. arXiv preprint arXiv:2403.13679, 2024a.
  34. From persona to personalization: A survey on role-playing language agents. arXiv preprint arXiv:2404.18231, 2024b.
  35. When large language models meet personalization: Perspectives of challenges and opportunities. World Wide Web, 27(4):42, 2024c.
  36. Junyi Chen. A survey on large language models for personalized and explainable recommendations. arXiv preprint arXiv:2311.12338, 2023.
  37. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  38. Large language models meet harry potter: A dataset for aligning dialogue agents with characters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  8506–8520, 2023a.
  39. Can language models be instructed to protect personal information? arXiv preprint arXiv:2310.02224, 2023b.
  40. Marked personas: Using natural language prompts to measure stereotypes in language models. arXiv preprint arXiv:2305.18189, 2023a.
  41. CoMPosT: Characterizing and evaluating caricature in LLM simulations. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  10853–10875, Singapore, December 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.669. URL https://aclanthology.org/2023.emnlp-main.669.
  42. Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937, 2023.
  43. A personalized dialogue generator with implicit user persona detection. arXiv preprint arXiv:2204.07372, 2022.
  44. Large language models for user interest journeys. arXiv preprint arXiv:2305.15498, 2023.
  45. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  46. Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092, 2023.
  47. Uncovering chatgpt’s capabilities in recommender systems. In Proceedings of the 17th ACM Conference on Recommender Systems, pp.  1126–1132, 2023.
  48. MPCoder: Multi-user personalized code generator with explicit and implicit style representation learning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  3765–3780, Bangkok, Thailand, August 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.acl-long.207.
  49. Educhat: A large-scale language model-based chatbot system for intelligent education. arXiv preprint arXiv:2308.02773, 2023.
  50. P-tailor: Customizing personality traits for language models via mixture of specialized lora experts. arXiv preprint arXiv:2406.12548, 2024.
  51. Perseval: Assessing personalization in text summarizers. arXiv preprint arXiv:2407.00453, 2024.
  52. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023.
  53. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  54. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  55. Dario Di Palma. Retrieval-augmented recommender system: Enhancing recommender systems with large language models. In Proceedings of the 17th ACM Conference on Recommender Systems, pp.  1369–1373, 2023.
  56. Evaluating chatgpt as a recommender system: A rigorous approach. arXiv preprint arXiv:2309.03613, 2023.
  57. Can ai language models replace human participants? Trends in Cognitive Sciences, 27(7):597–600, 2023.
  58. The second conversational intelligence challenge (convai2). In The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations, pp.  187–208. Springer, 2020.
  59. Longrope: Extending llm context window beyond 2 million tokens. arXiv preprint arXiv:2402.13753, 2024.
  60. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th international conference on World Wide Web, pp.  581–590, 2007.
  61. Retrieval-augmented generative question answering for event argument extraction. arXiv preprint arXiv:2211.07067, 2022.
  62. Enhancing job recommendation through llm-based generative adversarial networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  8363–8371, 2024.
  63. Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360, 2021.
  64. Alpacafarm: A simulation framework for methods that learn from human feedback. Advances in Neural Information Processing Systems, 36, 2024.
  65. On-demand feature recommendations derived from mining public product descriptions. In Proceedings of the 33rd international conference on software engineering, pp.  181–190, 2011.
  66. Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388, 2023.
  67. How useful are educational questions generated by large language models? In International Conference on Artificial Intelligence in Education, pp.  536–542. Springer, 2023.
  68. Understanding dataset difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-usable information. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  5988–6008. PMLR, 17–23 Jul 2022.
  69. Internlm-law: An open source chinese legal large language model. arXiv preprint arXiv:2406.14887, 2024.
  70. Exposing privacy gaps: Membership inference attack on preference data for llm alignment. arXiv preprint arXiv:2407.06443, 2024.
  71. Iason Gabriel. Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437, 2020.
  72. Bias and Fairness in Large Language Models: A Survey. Computational Linguistics, pp.  1–83, 07 2024. ISSN 0891-2017. doi: 10.1162/coli_a_00524. URL https://doi.org/10.1162/coli_a_00524.
  73. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022.
  74. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
  75. Political discourse on social media: Echo chambers, gatekeepers, and the price of bipartisanship. In Proceedings of the 2018 world wide web conference, pp.  913–922, 2018.
  76. Mart: Improving llm safety with multi-round automatic red-teaming. arXiv preprint arXiv:2311.07689, 2023.
  77. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462, 2020.
  78. Gender-tuning: Empowering fine-tuning for debiasing pre-trained language models. arXiv preprint arXiv:2307.10522, 2023.
  79. Automatically generated summaries of video lectures may enhance students’ learning experience. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pp.  382–393, 2023.
  80. Olmo: Accelerating the science of language models. arXiv preprint arXiv:2402.00838, 2024.
  81. Hui Guo. Soap: Live recommendations through social agents. In Fifth DELOS Workshop on Filtering and Collaborative Filtering, Budapest. Citeseer, 1997.
  82. Bias runs deep: Implicit reasoning biases in persona-assigned llms. arXiv preprint arXiv:2311.04892, 2023.
  83. Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints, 2023.
  84. Meet your favorite character: Open-domain chatbot mimicking fictional characters with only a few utterances. arXiv preprint arXiv:2204.10825, 2022.
  85. Balancing out bias: Achieving fairness through balanced training. arXiv preprint arXiv:2109.08253, 2021.
  86. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):1–19, 2015.
  87. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768, 2023.
  88. Building user profiles from topic models for personalised search. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp.  2309–2314, 2013.
  89. Cos: Enhancing personalization and mitigating bias with context steering. arXiv preprint arXiv:2405.01768, 2024a.
  90. A survey on user behavior modeling in recommender systems. arXiv preprint arXiv:2302.11087, 2023.
  91. Simucourt: Building judicial decision-making agents with real-world judgement documents. arXiv preprint arXiv:2403.02959, 2024b.
  92. Prompt engineering in medical education. International Medical Education, 2(3):198–205, 2023.
  93. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939, 2015.
  94. John J Horton. Large language models as simulated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023.
  95. Large language models are zero-shot rankers for recommender systems. In European Conference on Information Retrieval, pp.  364–381. Springer, 2024.
  96. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  97. Rethinking llm-based preference evaluation. arXiv preprint arXiv:2407.01085, 2024.
  98. On the humanity of conversational ai: Evaluating the psychological portrayal of llms. In The Twelfth International Conference on Learning Representations, 2023a.
  99. Who is chatgpt? benchmarking llms’ psychological portrayal using psychobench. arXiv preprint arXiv:2310.01386, 2023b.
  100. Selective prompting tuning for personalized conversations with llms, 2024. URL https://arxiv.org/abs/2406.18187.
  101. Lawyer llama technical report. arXiv preprint arXiv:2305.15062, 2023c.
  102. Leveraging the potential of large language models in education through playful and game-based learning. Educational Psychology Review, 36(1):25, 2024.
  103. Leveraging passage retrieval with generative models for open domain question answering. In Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp.  874–880, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.74. URL https://aclanthology.org/2021.eacl-main.74.
  104. Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118, 2021.
  105. Personalized soups: Personalized large language model alignment via post-hoc parameter merging. arXiv preprint arXiv:2310.11564, 2023.
  106. Identity decoupling for multi-subject personalization of text-to-image models. arXiv preprint arXiv:2404.04243, 2024.
  107. Large language models in education: A focus on the complementary relationship between human teachers and chatgpt. Education and Information Technologies, 28(12):15873–15892, 2023.
  108. Beavertails: Towards improved safety alignment of llm via a human-preference dataset. Advances in Neural Information Processing Systems, 36, 2024a.
  109. Genrec: Large language model for generative recommendation. In European Conference on Information Retrieval, pp.  494–502. Springer, 2024b.
  110. Evaluating and inducing personality in pre-trained language models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp.  10622–10643. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/21f7b745f73ce0d1f9bcea7f40b1388e-Paper-Conference.pdf.
  111. Into the unknown unknowns: Engaged human learning through participation in language model agent conversations. arXiv preprint arXiv:2408.15232, 2024.
  112. Llm maybe longlm: Self-extend llm context window without tuning. arXiv preprint arXiv:2401.01325, 2024a.
  113. Health-llm: Personalized retrieval-augmented disease prediction system. arXiv preprint arXiv:2402.00746, 2024b.
  114. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547, 2021. doi: 10.1109/TBDATA.2019.2921572.
  115. Doing personal laps: Llm-augmented dialogue construction for personalized multi-session conversational search. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, pp.  796–806, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400704314. doi: 10.1145/3626772.3657815. URL https://doi.org/10.1145/3626772.3657815.
  116. From" let’s google" to" let’s chatgpt": Student and instructor perspectives on the influence of llms on undergraduate engineering education. arXiv preprint arXiv:2309.10694, 2023.
  117. Do llms understand user preferences? evaluating llms on user rating prediction. arXiv preprint arXiv:2305.06474, 2023.
  118. Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  6769–6781, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.550. URL https://aclanthology.org/2020.emnlp-main.550.
  119. Estimating the personality of white-box language models. arXiv preprint arXiv:2204.12000, 2022.
  120. Chatgpt for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103:102274, 2023. ISSN 1041-6080. doi: https://doi.org/10.1016/j.lindif.2023.102274. URL https://www.sciencedirect.com/science/article/pii/S1041608023000195.
  121. A survey of reinforcement learning from human feedback. arXiv preprint arXiv:2312.14925, 2023.
  122. Few-shot personalization of llms with mis-aligned responses. arXiv preprint arXiv:2406.18678, 2024.
  123. The prism alignment project: What participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models. arXiv preprint arXiv:2404.16019, 2024.
  124. Large language models as superpositions of cultural perspectives. arXiv preprint arXiv:2307.07870, 2023.
  125. Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, pp.  5260–5271, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400704901. doi: 10.1145/3637528.3671573. URL https://doi.org/10.1145/3637528.3671573.
  126. Longlamp: A benchmark for personalized long-form text generation. arXiv preprint, 2024.
  127. Large language models in law: A survey. arXiv preprint arXiv:2312.03718, 2023.
  128. Can llms be good financial advisors?: An initial study in personal decision making for optimized outcomes. arXiv preprint arXiv:2307.07422, 2023.
  129. Addressing cold-start problem in recommendation systems. In Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, ICUIMC ’08, pp.  208–211, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781595939937. doi: 10.1145/1352793.1352837. URL https://doi.org/10.1145/1352793.1352837.
  130. Tomo Lazovich. Filter bubbles and affective polarization in user-personalized large language model outputs. In Javier Antorán, Arno Blaas, Kelly Buchanan, Fan Feng, Vincent Fortuin, Sahra Ghalebikesabi, Andreas Kriegler, Ian Mason, David Rohde, Francisco J. R. Ruiz, Tobias Uelwer, Yubin Xie, and Rui Yang (eds.), Proceedings on "I Can’t Believe It’s Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops, volume 239 of Proceedings of Machine Learning Research, pp.  29–37. PMLR, 16 Dec 2023. URL https://proceedings.mlr.press/v239/lazovich23a.html.
  131. Melu: Meta-learned user preference estimator for cold-start recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  1073–1082, 2019.
  132. Aligning to thousands of preferences via system message generalization. arXiv preprint arXiv:2405.17977, 2024.
  133. Putting things into context: Generative ai-enabled context personalization for vocabulary learning improves learning motivation. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400703300. doi: 10.1145/3613904.3642393. URL https://doi.org/10.1145/3613904.3642393.
  134. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  135. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  136. Chatharuhi: Reviving anime character in reality via large language model. arXiv preprint arXiv:2308.09597, 2023a.
  137. Teach llms to personalize–an approach inspired by writing education. arXiv preprint arXiv:2308.07968, 2023b.
  138. Learning to rewrite prompts for personalized text generation. In Proceedings of the ACM on Web Conference 2024, WWW ’24. ACM, May 2024a. doi: 10.1145/3589334.3645408. URL http://dx.doi.org/10.1145/3589334.3645408.
  139. Learning to rewrite prompts for personalized text generation. In Proceedings of the ACM on Web Conference 2024, pp.  3367–3378, 2024b.
  140. Enhancing llm factual accuracy with rag to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. arXiv preprint arXiv:2403.10446, 2024c.
  141. From zero-shot learning to cold-start recommendation. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp.  4189–4196, 2019.
  142. Tapilot-crossing: Benchmarking and evolving llms towards interactive data analysis agents. arXiv preprint arXiv:2403.05307, 2024d.
  143. Generate neural template explanations for recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, pp.  755–764, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3411992. URL https://doi.org/10.1145/3340531.3411992.
  144. CoAnnotating: Uncertainty-guided work allocation between human and large language models for data annotation. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  1487–1505, Singapore, December 2023c. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.92. URL https://aclanthology.org/2023.emnlp-main.92.
  145. Towards controllable and personalized review generation. arXiv preprint arXiv:1910.03506, 2019.
  146. A survey on representation learning for user modeling. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp.  4997–5003, 2021.
  147. A preliminary study of chatgpt on news recommendation: Personalization, provider fairness, fake news. arXiv preprint arXiv:2306.10702, 2023d.
  148. Personalized language modeling from personalized human feedback. arXiv preprint arXiv:2402.05133, 2024e.
  149. A survey on fairness in large language models. arXiv preprint arXiv:2308.10149, 2023e.
  150. Personal llm agents: Insights and survey about the capability, efficiency and security. arXiv preprint arXiv:2401.05459, 2024f.
  151. The unlocking spell on base LLMs: Rethinking alignment via in-context learning. In The Twelfth International Conference on Learning Representations, 2024a. URL https://openreview.net/forum?id=wxJ0eXwwda.
  152. Infinite-llm: Efficient llm service for long context with distattention and distributed kvcache. arXiv preprint arXiv:2401.02669, 2024b.
  153. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.  74–81, 2004.
  154. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, pp.  150–157, 2003.
  155. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04), pp.  605–612, 2004.
  156. Is chatgpt a good recommender? a preliminary study. arXiv preprint arXiv:2304.10149, 2023a.
  157. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024a. doi: 10.1162/tacl_a_00638. URL https://aclanthology.org/2024.tacl-1.9.
  158. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024b.
  159. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9), jan 2023b. ISSN 0360-0300. doi: 10.1145/3560815. URL https://doi.org/10.1145/3560815.
  160. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023c.
  161. Once: Boosting content-based recommendation with both open-and closed-source large language models. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp.  452–461, 2024c.
  162. Ruibo Liu. Aligning language models with the human world. (276), 2024. URL https://digitalcommons.dartmouth.edu/dissertations/276.
  163. Mitigating political bias in language models through reinforced calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  14857–14866, 2021.
  164. Fingpt: Democratizing internet-scale data for financial large language models. arXiv preprint arXiv:2307.10485, 2023d.
  165. Large language models are few-shot health learners. arXiv preprint arXiv:2305.15525, 2023e.
  166. Prompt injection attack against llm-integrated applications. arXiv preprint arXiv:2306.05499, 2023f.
  167. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860, 2023g.
  168. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  169. Pharmacygpt: The ai pharmacist. arXiv preprint arXiv:2307.10432, 2023h.
  170. The flan collection: Designing data and methods for effective instruction tuning. In International Conference on Machine Learning, pp.  22631–22648. PMLR, 2023.
  171. Gender bias in neural natural language processing. Logic, language, and security: essays dedicated to Andre Scedrov on the occasion of his 65th birthday, pp.  189–202, 2020.
  172. Analyzing leakage of personally identifiable information in language models. In 2023 IEEE Symposium on Security and Privacy (SP), pp.  346–363. IEEE, 2023.
  173. LLM-Rec: Personalized recommendation via prompting large language models. arXiv preprint arXiv:2307.15780, 2023.
  174. Generating personalized recipes from historical user preferences. arXiv preprint arXiv:1909.00105, 2019.
  175. Training millions of personalized dialogue agents. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  2775–2779, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1298. URL https://aclanthology.org/D18-1298.
  176. Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4):1093–1113, 2014.
  177. Bertalan Meskó. Prompt engineering as an important emerging skill for medical professionals: tutorial. Journal of medical Internet research, 25:e50638, 2023.
  178. Microsoft. Reinventing search with a new ai-powered bing and edge, your copilot for the web, 2023. URL https://news.microsoft.com/the-new-bing/. Accessed: 2024-07-29.
  179. Microsoft. Meet microsoft copilot. https://www.microsoft.com/en-us/microsoft-copilot/meet-copilot, 2024. Accessed: 2024-09-17.
  180. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2):1–40, 2023.
  181. Modeling online browsing and path analysis using clickstream data. Marketing science, 23(4):579–595, 2004.
  182. Pearl: Personalizing large language model writing assistants with generation-calibrated retrievers. arXiv preprint arXiv:2311.09180, 2023a.
  183. Large language model augmented narrative driven recommendations. In Proceedings of the 17th ACM Conference on Recommender Systems, pp.  777–783, 2023b.
  184. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  185. John J Nay. Large language models as fiduciaries: a case study toward robustly communicating with artificial intelligence through legal standards. arXiv preprint arXiv:2301.10095, 2023.
  186. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pp.  278–287, 1999.
  187. Ha-Thanh Nguyen. A brief report on lawgpt 1.0: A virtual legal assistant based on gpt-3. arXiv preprint arXiv:2302.05729, 2023.
  188. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  188–197, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1018. URL https://aclanthology.org/D19-1018.
  189. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877, 2021.
  190. Large dual encoders are generalizable retrievers. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  9844–9855, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.669. URL https://aclanthology.org/2022.emnlp-main.669.
  191. Cheatagent: Attacking llm-empowered recommender systems via llm agent. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, pp.  2284–2295, New York, NY, USA, 2024a. Association for Computing Machinery. ISBN 9798400704901. doi: 10.1145/3637528.3671837. URL https://doi.org/10.1145/3637528.3671837.
  192. User-llm: Efficient llm contextualization with user embeddings. arXiv preprint arXiv:2402.13598, 2024b.
  193. Peter Ochieng. Are large language models fit for guided reading? arXiv preprint arXiv:2305.10645, 2023.
  194. Generative ai: Implications and applications for education. arXiv preprint arXiv:2305.07605, 2023.
  195. Empathic conversations: A multi-level dataset of contextualized conversations. arXiv preprint arXiv:2205.12698, 2022.
  196. OpenAI. Searchgpt is a prototype of new ai search features, 2024. URL https://openai.com/index/searchgpt-prototype/. Accessed: 2024-07-29.
  197. The ecological fallacy in annotation: Modelling human label variation goes beyond sociodemographics. arXiv preprint arXiv:2306.11559, 2023.
  198. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  199. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  311–318, 2002.
  200. Eli Pariser. The filter bubble: How the new personalized web is changing what we read and how we think. Penguin, 2011.
  201. Principled rlhf from heterogeneous feedback via personalization and preference aggregation. arXiv preprint arXiv:2405.00254, 2024a.
  202. Empowering personalized learning through a conversation-based tutoring system with student modeling. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, CHI EA ’24, New York, NY, USA, 2024b. Association for Computing Machinery. ISBN 9798400703317. doi: 10.1145/3613905.3651122. URL https://doi.org/10.1145/3613905.3651122.
  203. Pairwise preference regression for cold-start recommendation. In Proceedings of the third ACM conference on Recommender systems, pp.  21–28, 2009.
  204. Building user profiles for recommender systems from incomplete preference relations. In 2007 IEEE International Fuzzy Systems Conference, pp.  1–6. IEEE, 2007.
  205. Generative ai for programming education: Benchmarking chatgpt, gpt-4, and human tutors. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 2, pp.  41–42, 2023.
  206. Robin Lewis Plackett. The analysis of permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2):193–202, 1975.
  207. Personalizing reinforcement learning from human feedback with variational preference learning. arXiv preprint arXiv:2408.10075, 2024.
  208. Is ChatGPT a general-purpose natural language processing task solver? In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  1339–1384, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.85. URL https://aclanthology.org/2023.emnlp-main.85.
  209. Improving language understanding by generative pre-training. 2018.
  210. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  211. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
  212. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020.
  213. Recommender systems with generative retrieval. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp.  10299–10315. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/20dcab0f14046a5c6b02b61da9f13229-Paper-Conference.pdf.
  214. Ethical reasoning over moral alignment: A case and framework for in-context ethical policies in llms. arXiv preprint arXiv:2310.07251, 2023.
  215. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  3982–3992, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1410. URL https://aclanthology.org/D19-1410.
  216. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on Computer supported cooperative work, pp.  175–186, 1994.
  217. Integrating summarization and retrieval for enhanced personalization via large language models. arXiv preprint arXiv:2310.20081, 2023.
  218. Okapi at trec-3. Nist Special Publication Sp, 109:109, 1995.
  219. The value of purchase history data in target marketing. Marketing Science, 15(4):321–340, 1996.
  220. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  221. Unintended impacts of llm alignment on global representation. arXiv preprint arXiv:2402.15018, 2024.
  222. Personality traits in large language models. arXiv preprint arXiv:2307.00184, 2023.
  223. Lamp: When large language models meet personalization. arXiv preprint arXiv:2304.11406, 2023.
  224. Optimization methods for personalizing large language models through retrieval augmentation, 2024.
  225. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. 1998.
  226. Large language models are competitive near cold-start recommenders for language-and item-based preferences. In Proceedings of the 17th ACM conference on recommender systems, pp.  890–896, 2023.
  227. Whose opinions do language models reflect? In International Conference on Machine Learning, pp.  29971–30004. PMLR, 2023.
  228. Collaborative filtering recommender systems. In The adaptive web: methods and strategies of web personalization, pp.  291–324. Springer, 2007.
  229. Methods and metrics for cold-start recommendations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’02, pp.  253–260, New York, NY, USA, 2002. Association for Computing Machinery. ISBN 1581135610. doi: 10.1145/564376.564421. URL https://doi.org/10.1145/564376.564421.
  230. The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608, 2024.
  231. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning, pp.  1889–1897, 2015.
  232. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  233. Show, don’t tell: Aligning language models with demonstrated feedback. arXiv preprint arXiv:2406.00888, 2024.
  234. Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences. arXiv preprint arXiv:2404.12272, 2024.
  235. Character-LLM: A trainable agent for role-playing. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  13153–13187, Singapore, December 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.emnlp-main.814.
  236. Generative echo chamber? effect of llm-powered search systems on diverse information seeking. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400703300. doi: 10.1145/3613904.3642459. URL https://doi.org/10.1145/3613904.3642459.
  237. Performance of chatgpt on usmle: Unlocking the potential of large language models for ai-assisted medical education. arXiv preprint arXiv:2307.00112, 2023.
  238. Roleeval: A bilingual role evaluation benchmark for large language models. arXiv preprint arXiv:2312.16132, 2023.
  239. Pmg: Personalized multimodal generation with large language models. In Proceedings of the ACM on Web Conference 2024, pp.  3833–3843, 2024.
  240. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pp.  31210–31227. PMLR, 2023.
  241. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567, 2021.
  242. Web search personalization with ontological user profiles. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp.  525–534, 2007.
  243. Mpnet: Masked and permuted pre-training for language understanding. Advances in neural information processing systems, 33:16857–16867, 2020.
  244. Adapting deep ranknet for personalized search. In Proceedings of the 7th ACM international conference on Web search and data mining, pp.  83–92, 2014.
  245. Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11–21, 1972.
  246. Comparing traditional and llm-based search for consumer choice: A randomized experiment. arXiv preprint arXiv:2307.03744, 2023.
  247. Beyond memorization: Violating privacy via inference with large language models. arXiv preprint arXiv:2310.07298, 2023.
  248. Learning to summarize from human feedback. In NeurIPS, 2020a.
  249. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020b.
  250. Persona-db: Efficient large language model personalization for response prediction with collaborative data refinement. arXiv preprint arXiv:2402.11060, 2024.
  251. Richard Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.
  252. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
  253. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pp.  1057–1063, 2000.
  254. User modeling in the era of large language models: Current research and future directions, 2023.
  255. Personalized pieces: Efficient personalized large language models through collaborative efforts. arXiv preprint arXiv:2406.10471, 2024a.
  256. Democratizing large language models via personalized parameter-efficient fine-tuning. arXiv preprint arXiv:2402.04401, 2024b.
  257. Does synthetic data generation of llms help clinical text mining?, 2023. URL https://arxiv.org/abs/2303.04360.
  258. Alignment for advanced machine learning systems. Ethics of artificial intelligence, pp.  342–382, 2016.
  259. Understanding and predicting personal navigation. In Proceedings of the fourth ACM international conference on Web search and data mining, pp.  85–94, 2011.
  260. Dart-math: Difficulty-aware rejection tuning for mathematical problem-solving. arXiv preprint arXiv:2407.13690, 2024.
  261. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  262. Legal prompt engineering for multilingual legal judgement prediction. arXiv preprint arXiv:2212.02199, 2022.
  263. Two tales of persona in llms: A survey of role-playing and personalization. arXiv preprint arXiv:2406.01171, 2024.
  264. What should data science education do with large language models. arXiv preprint arXiv:2307.02792, 3, 2023.
  265. Accuracy is not enough: Evaluating personalization in summarizers. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  2582–2595, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.169. URL https://aclanthology.org/2023.findings-emnlp.169.
  266. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  267. Exploring safety-utility trade-offs in personalized language models. arXiv preprint arXiv:2406.11107, 2024.
  268. Temporal latent topic user profiles for search personalisation. In Advances in Information Retrieval: 37th European Conference on IR Research, ECIR 2015, Vienna, Austria, March 29-April 2, 2015. Proceedings 37, pp.  605–616. Springer, 2015.
  269. Improving search personalisation with dynamic group formation. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp.  951–954, 2014.
  270. Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. arXiv preprint arXiv:2310.05280, 2023.
  271. Large language models cannot replace human participants because they cannot portray identity groups. arXiv preprint arXiv:2402.01908, 2024a.
  272. Zero-shot next-item recommendation using large pretrained language models. arXiv preprint arXiv:2304.03153, 2023.
  273. Is chatgpt a good teacher coach? measuring zero-shot performance for scoring and providing actionable insights on classroom instruction. arXiv preprint arXiv:2306.03090, 2023.
  274. Large language models for education: A survey and outlook. arXiv preprint arXiv:2403.18105, 2024b.
  275. Generative recommendation: Towards next-generation recommender paradigm. arXiv preprint arXiv:2304.03516, 2023a.
  276. Executable code actions elicit better llm agents. arXiv preprint arXiv:2402.01030, 2024c.
  277. Opendevin: An open platform for ai software developers as generalist agents. arXiv preprint arXiv:2407.16741, 2024d.
  278. Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews, 2024e.
  279. Knowledge graph prompting for multi-document question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  19206–19214, 2024f.
  280. Are large language models ready for healthcare? a comparative study on clinical language understanding. In Machine Learning for Healthcare Conference, pp.  804–823. PMLR, 2023b.
  281. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746, 2023c.
  282. Towards Human-Like Educational Question Generation with Large Language Models, pp.  153–166. Springer International Publishing, 2022. ISBN 9783031116445. doi: 10.1007/978-3-031-11644-5_13. URL http://dx.doi.org/10.1007/978-3-031-11644-5_13.
  283. A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7):5731–5780, 2022.
  284. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, 36, 2024a.
  285. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  286. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022b.
  287. Collaborative filtering and deep learning based recommendation system for cold start items. Expert Systems with Applications, 69:29–39, 2017.
  288. Towards unified multi-modal personalization: Large vision-language models for generative recommendation and beyond. arXiv preprint arXiv:2403.10667, 2024b.
  289. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023.
  290. Enhancing personalized search by mining and modeling task behavior. In Proceedings of the 22nd international conference on World Wide Web, pp.  1411–1420, 2013.
  291. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  292. Personalized large language models. arXiv preprint arXiv:2402.09269, 2024.
  293. Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th annual meeting of the association for computational linguistics, pp.  3597–3606, 2020.
  294. A survey on large language models for recommendation. arXiv preprint arXiv:2305.19860, 2023a.
  295. Exploring large language model for graph data understanding in online job recommendations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  9178–9186, 2024.
  296. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023b.
  297. Towards open-world recommendation with knowledge augmentation from large language models. arXiv preprint arXiv:2306.10933, 2023.
  298. Agentless: Demystifying llm-based software engineering agents. arXiv preprint arXiv:2407.01489, 2024.
  299. Expertprompting: Instructing large language models to be distinguished experts. arXiv preprint arXiv:2305.14688, 2023.
  300. Magpie: Alignment data synthesis from scratch by prompting aligned llms with nothing, 2024a.
  301. Llm jailbreak attack versus defense techniques–a comprehensive study. arXiv preprint arXiv:2402.13457, 2024b.
  302. On protecting the data privacy of large language models (llms): A survey. arXiv preprint arXiv:2403.05156, 2024a.
  303. Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology, 55(1):90–112, 2024b.
  304. Diyi Yang. Computational Social Roles. PhD thesis, Carnegie Mellon University Pittsburgh, PA, USA, 2019.
  305. Towards user-centric text-to-text generation: A survey. In Text, Speech, and Dialogue: 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings, pp.  3–22, Berlin, Heidelberg, 2021. Springer-Verlag. ISBN 978-3-030-83526-2. doi: 10.1007/978-3-030-83527-9_1. URL https://doi.org/10.1007/978-3-030-83527-9_1.
  306. Palr: Personalization aware llms for recommendation, 2023.
  307. Rewards-in-context: Multi-objective alignment of foundation models with dynamic preference adjustment. arXiv preprint arXiv:2402.10207, 2024a.
  308. Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  19368–19376, 2024b.
  309. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, pp.  100211, 2024.
  310. Yelp. Yelp dataset challenge, 2014. URL https://www.yelp.com/dataset/challenge.
  311. Heterogeneous knowledge fusion: A novel approach for personalized recommendation via llm. In Proceedings of the 17th ACM Conference on Recommender Systems, pp.  599–601, 2023.
  312. Legal prompting: Teaching a language model to think like a lawyer. arXiv preprint arXiv:2212.01326, 2022.
  313. Large language models for healthcare data augmentation: An example on patient-trial matching. AMIA Annu. Symp. Proc., 2023:1324–1333, 2023.
  314. Evaluating character understanding of large language models via character profiling from fictional works. arXiv preprint arXiv:2404.12726, 2024.
  315. Disc-lawllm: Fine-tuning large language models for intelligent legal services. arXiv preprint arXiv:2309.11325, 2023.
  316. A personalized dense retrieval framework for unified information access. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.  121–130, 2023.
  317. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics ACL 2024, pp.  10471–10506, Bangkok, Thailand and virtual meeting, August 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.findings-acl.624.
  318. Recommendation as instruction following: A large language model empowered recommendation approach. arXiv preprint arXiv:2305.07001, 2023a.
  319. Memory-augmented llm personalization with short- and long-term memory coordination, 2023b.
  320. LLM-based medical assistant personalization with short- and long-term memory coordination. In Kevin Duh, Helena Gomez, and Steven Bethard (eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp.  2386–2398, Mexico City, Mexico, June 2024a. Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.132. URL https://aclanthology.org/2024.naacl-long.132.
  321. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792, 2023c.
  322. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  323. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SkeHuCVFDr.
  324. Autocoderover: Autonomous program improvement. arXiv preprint arXiv:2404.05427, 2024b.
  325. Mitigating biases in hate speech detection from a causal perspective. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  6610–6625, Singapore, December 2023d. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.440. URL https://aclanthology.org/2023.findings-emnlp.440.
  326. Darg: Dynamic evaluation of large language models via adaptive reasoning graph. arXiv preprint arXiv:2406.17271, 2024c.
  327. Llm-based federated recommendation. arXiv preprint arXiv:2402.09959, 2024a.
  328. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  329. Recommender systems in the era of large language models (llms). IEEE Transactions on Knowledge and Data Engineering, 2024b.
  330. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36:46595–46623, 2023a.
  331. Generative job recommendations with large language model. arXiv preprint arXiv:2307.02157, 2023b.
  332. Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36, 2024a.
  333. Characterglm: Customizing chinese conversational ai characters with large language models. arXiv preprint arXiv:2311.16832, 2023.
  334. Group based personalized search by integrating search behaviour and friend network. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.  92–101, 2021.
  335. Cognitive personalized search integrating large language models with an efficient memory mechanism. In Proceedings of the ACM on Web Conference 2024, WWW ’24, pp.  1464–1473, New York, NY, USA, 2024b. Association for Computing Machinery. ISBN 9798400701719. doi: 10.1145/3589334.3645482. URL https://doi.org/10.1145/3589334.3645482.
  336. Principled reinforcement learning with human feedback from pairwise or K𝐾Kitalic_K-wise comparisons. CoRR, abs/2301.11270, 2023. URL https://arxiv.org/abs/2301.11270.
  337. Hydra: Model factorization framework for black-box llm personalization. arXiv preprint arXiv:2406.02888, 2024.
  338. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web, pp.  22–32, 2005.
  339. Can Large Language Models Transform Computational Social Science? Computational Linguistics, 50(1):237–291, 03 2024. ISSN 0891-2017. doi: 10.1162/coli_a_00502. URL https://doi.org/10.1162/coli_a_00502.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (21)
  1. Zhehao Zhang (18 papers)
  2. Ryan A. Rossi (124 papers)
  3. Branislav Kveton (98 papers)
  4. Yijia Shao (18 papers)
  5. Diyi Yang (151 papers)
  6. Hamed Zamani (88 papers)
  7. Franck Dernoncourt (161 papers)
  8. Joe Barrow (12 papers)
  9. Tong Yu (119 papers)
  10. Sungchul Kim (65 papers)
  11. Ruiyi Zhang (98 papers)
  12. Jiuxiang Gu (73 papers)
  13. Tyler Derr (48 papers)
  14. Hongjie Chen (23 papers)
  15. Junda Wu (35 papers)
  16. Xiang Chen (343 papers)
  17. Zichao Wang (34 papers)
  18. Subrata Mitra (20 papers)
  19. Nedim Lipka (49 papers)
  20. Nesreen Ahmed (18 papers)
Citations (2)
Youtube Logo Streamline Icon: https://streamlinehq.com