Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoS: Enhancing Personalization and Mitigating Bias with Context Steering (2405.01768v1)

Published 2 May 2024 in cs.CL and cs.AI

Abstract: When querying a LLM, the context, i.e. personal, demographic, and cultural information specific to an end-user, can significantly shape the response of the LLM. For example, asking the model to explain Newton's second law with the context "I am a toddler" yields a different answer compared to the context "I am a physics professor." Proper usage of the context enables the LLM to generate personalized responses, whereas inappropriate contextual influence can lead to stereotypical and potentially harmful generations (e.g. associating "female" with "housekeeper"). In practice, striking the right balance when leveraging context is a nuanced and challenging problem that is often situation-dependent. One common approach to address this challenge is to fine-tune LLMs on contextually appropriate responses. However, this approach is expensive, time-consuming, and not controllable for end-users in different situations. In this work, we propose Context Steering (CoS) - a simple training-free method that can be easily applied to autoregressive LLMs at inference time. By measuring the contextual influence in terms of token prediction likelihood and modulating it, our method enables practitioners to determine the appropriate level of contextual influence based on their specific use case and end-user base. We showcase a variety of applications of CoS including amplifying the contextual influence to achieve better personalization and mitigating unwanted influence for reducing model bias. In addition, we show that we can combine CoS with Bayesian Inference to quantify the extent of hate speech on the internet. We demonstrate the effectiveness of CoS on state-of-the-art LLMs and benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. PENS: A dataset and generic framework for personalized news headline generation. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  82–92, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.7. URL https://aclanthology.org/2021.acl-long.7.
  2. Measuring implicit bias in explicitly unbiased large language models, 2024.
  3. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf.
  4. Language models are few-shot learners, 2020.
  5. Crowd-based personalized natural language explanations for recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pp.  175–182, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450340359. doi: 10.1145/2959100.2959153. URL https://doi.org/10.1145/2959100.2959153.
  6. Latent hatred: A benchmark for understanding implicit hate speech. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  345–363, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.emnlp-main.29.
  7. Pragmatic language interpretation as probabilistic inference. Trends in cognitive sciences, 20(11):818–829, 2016.
  8. Olmo: Accelerating the science of language models, 2024.
  9. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection, 2022.
  10. Mistral 7b, 2023.
  11. Gender biases and where to find them: Exploring gender bias in pre-trained transformer-based language models using movement pruning. In Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, and Hila Gonen (eds.), Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pp.  67–73, Seattle, Washington, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.gebnlp-1.6. URL https://aclanthology.org/2022.gebnlp-1.6.
  12. Debiasing pre-trained contextualised embeddings. In Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp.  1256–1266, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.107. URL https://aclanthology.org/2021.eacl-main.107.
  13. Understanding black-box predictions via influence functions, 2020.
  14. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, CI ’23, pp.  12–24, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701139. doi: 10.1145/3582269.3615599. URL https://doi.org/10.1145/3582269.3615599.
  15. Towards controllable and personalized review generation, 2020.
  16. Contrastive decoding: Open-ended text generation as optimization, 2023.
  17. Holistic evaluation of language models, 2023.
  18. Generating personalized recipes from historical user preferences, 2019.
  19. Large language models are geographically biased, 2024.
  20. Engagement, user satisfaction, and the amplification of divisive content on social media, 2023.
  21. More human than human: measuring chatgpt political bias. Public Choice, 198, 08 2023. doi: 10.1007/s11127-023-01097-2.
  22. Training language models to follow instructions with human feedback, 2022.
  23. Bbq: A hand-built bias benchmark for question answering, 2022.
  24. Reducing non-normative text generation from language models. In Brian Davis, Yvette Graham, John Kelleher, and Yaji Sripada (eds.), Proceedings of the 13th International Conference on Natural Language Generation, pp.  374–383, Dublin, Ireland, December 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.inlg-1.43. URL https://aclanthology.org/2020.inlg-1.43.
  25. Direct preference optimization: Your language model is secretly a reward model, 2023.
  26. Lamp: When large language models meet personalization, 2024.
  27. Multitask prompted training enables zero-shot task generalization. CoRR, abs/2110.08207, 2021. URL https://arxiv.org/abs/2110.08207.
  28. Extracting latent steering vectors from pretrained language models, 2022.
  29. How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022):1279–1285, 2011. doi: 10.1126/science.1192788. URL https://www.science.org/doi/abs/10.1126/science.1192788.
  30. Activation addition: Steering language models without optimization, 2023.
  31. Personalised language modelling of screen characters using rich metadata annotations, 2023.
  32. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
  33. Compact personalized models for neural machine translation, 2018.
  34. Assessing the potential of gpt-4 to perpetuate racial and gender biases in health care: a model evaluation study. The Lancet Digital Health, 6:e12–e22, 1 2024. ISSN 25897500. doi: 10.1016/S2589-7500(23)00225-X.
  35. Explainable recommendation: A survey and new perspectives. Foundations and Trends® in Information Retrieval, 14(1):1–101, 2020.
  36. Gender bias in contextualized word embeddings. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  629–634, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1064. URL https://aclanthology.org/N19-1064.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jerry Zhi-Yang He (5 papers)
  2. Sashrika Pandey (3 papers)
  3. Mariah L. Schrum (6 papers)
  4. Anca Dragan (62 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.