Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-Dimension Human Value Representation in Large Language Models (2404.07900v3)

Published 11 Apr 2024 in cs.CL and cs.AI

Abstract: The widespread application of LLMs across various tasks and fields has necessitated the alignment of these models with human values and preferences. Given various approaches of human value alignment, ranging from Reinforcement Learning with Human Feedback (RLHF), to constitutional learning, etc. there is an urgent need to understand the scope and nature of human values injected into these models before their release. There is also a need for model alignment without a costly large scale human annotation effort. We propose UniVaR, a high-dimensional representation of human value distributions in LLMs, orthogonal to model architecture and training data. Trained from the value-relevant output of eight multilingual LLMs and tested on the output from four multilingual LLMs, namely LlaMA2, ChatGPT, JAIS and Yi, we show that UniVaR is a powerful tool to compare the distribution of human values embedded in different LLMs with different langauge sources. Through UniVaR, we explore how different LLMs prioritize various values in different languages and cultures, shedding light on the complex interplay between human values and LLMing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (120)
  1. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
  2. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 675–718, 2023.
  3. Is ChatGPT a general-purpose natural language processing task solver? In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1339–1384, Singapore, December 2023. Association for Computational Linguistics.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  5. Training language models to follow instructions with human feedback, 2022.
  6. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  7. Constitutional ai: Harmlessness from ai feedback, 2022.
  8. From instructions to intrinsic human values–a survey of alignment goals for big models. arXiv preprint arXiv:2308.12014, 2023.
  9. Training language models with language feedback at scale. arXiv preprint arXiv:2303.16755, 2023.
  10. Openassistant conversations-democratizing large language model alignment. Advances in Neural Information Processing Systems, 36, 2024.
  11. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022.
  12. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022.
  13. Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388, 2023.
  14. Heterogeneous value alignment evaluation for large language models. In AAAI-2024 Workshop on Public Sector LLMs: Algorithmic and Sociotechnical Design, 2024.
  15. Probing pre-trained language models for cross-cultural differences in values. In Sunipa Dev, Vinodkumar Prabhakaran, David Adelani, Dirk Hovy, and Luciana Benotti, editors, Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics.
  16. Knowledge of cultural moral norms in large language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 428–446, Toronto, Canada, July 2023. Association for Computational Linguistics.
  17. Human feedback is not gold standard. In The Twelfth International Conference on Learning Representations, 2024.
  18. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768, 2023.
  19. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  20. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
  21. An empirical study of instruction-tuning large language models in chinese. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
  22. Yi: Open foundation models by 01.ai, 2024.
  23. Value alignment verification. In International Conference on Machine Learning, pages 1105–1115. PMLR, 2021.
  24. Measuring value understanding in language models through discriminator-critique gap. arXiv preprint arXiv:2310.00378, 2023.
  25. Geert Hofstede. Culture’s consequences: Comparing values, behaviors, institutions and organizations across nations. Sage publications, 2001.
  26. Cultures and organizations: Software of the mind, volume 2. Mcgraw-hill New York, 2005.
  27. Shalom H Schwartz. A theory of cultural values and some implications for work. Applied psychology: an international review, 1999.
  28. Shalom H. Schwartz. The Refined Theory of Basic Values, page 51–72. Springer International Publishing, 2017.
  29. Measuring the refined theory of individual values in 49 cultural groups: psychometrics of the revised portrait value questionnaire. Assessment, 29(5):1005–1019, 2022.
  30. World values surveys and european values surveys, 1981-1984, 1990-1993, and 1995-1997. Ann Arbor-Michigan, Institute for Social Research, ICPSR version, 2000.
  31. Ronald Inglehart. Mapping global values. Comparative sociology, 5(2-3):115–136, 2006.
  32. World values survey wave 7 (2017-2022) cross-national data-set, 2022.
  33. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374, 2023.
  34. Deep reinforcement learning from human preferences. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  35. Maybe only 0.5% data is needed: A preliminary exploration of low training data instruction tuning. arXiv preprint arXiv:2305.09246, 2023.
  36. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  37. Large language model alignment: A survey. arXiv preprint arXiv:2309.15025, 2023.
  38. The CRINGE loss: Learning what language not to model. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8854–8874, Toronto, Canada, July 2023. Association for Computational Linguistics.
  39. Learn what not to learn: Towards generative safety in chatbots. arXiv preprint arXiv:2304.11220, 2023.
  40. Cyclealign: Iterative distillation from black-box llm to white-box models for better human alignment. arXiv preprint arXiv:2310.16271, 2023.
  41. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  42. Llama 2: Open foundation and fine-tuned chat models, 2023.
  43. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  44. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  45. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
  46. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267, 2023.
  47. Self-rewarding language models. arXiv preprint arXiv:2401.10020, 2024.
  48. Unnatural instructions: Tuning language models with (almost) no human labor. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14409–14428, Toronto, Canada, July 2023. Association for Computational Linguistics.
  49. Training socially aligned language models in simulated human society. arXiv preprint arXiv:2305.16960, 2023.
  50. Aligning ai with shared human values. arXiv preprint arXiv:2008.02275, 2020.
  51. World values survey time-series (1981-2022) cross-national data-set, 2022.
  52. Investigating cultural alignment of large language models. arXiv preprint arXiv:2402.13231, 2024.
  53. Measuring social value orientation. Judgment and Decision making, 6(8):771–781, 2011.
  54. Geoffrey E Hinton. Distributed representations. 1984.
  55. Distributed representations. In Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations, pages 77–109. 1986.
  56. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
  57. Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
  58. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
  59. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  60. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  61. Massive exploration of neural machine translation architectures. In Martha Palmer, Rebecca Hwa, and Sebastian Riedel, editors, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1442–1451, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.
  62. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018.
  63. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
  64. Large-scale embedding learning in heterogeneous event data. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 907–912. IEEE, 2016.
  65. Spatial-aware object embeddings for zero-shot localization and classification of actions. In Proceedings of the IEEE international conference on computer vision, pages 4443–4452, 2017.
  66. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pages 173–182, 2017.
  67. Subobject-level image tokenization. arXiv preprint arXiv:2402.14327, 2024.
  68. Multi-layer representation learning for medical concepts. In proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1495–1504, 2016.
  69. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems, pages 191–198, 2016.
  70. On the information bottleneck theory of deep learning. In International Conference on Learning Representations, 2018.
  71. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pages 1–5, 2015.
  72. Self-supervised learning from a multi-view perspective. In International Conference on Learning Representations, 2021.
  73. Self-supervised learning with data augmentations provably isolates content from style. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  74. Contrastive learning inverts the data generating process. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12979–12990. PMLR, 18–24 Jul 2021.
  75. Milton Rokeach. A theory of organization and change within value-attitude systems. Journal of social issues, 1968.
  76. Milton Rokeach. The nature of human values. Free press, 1973.
  77. Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. 2022.
  78. To compress or not to compress—self-supervised learning and information theory: A review. Entropy, 26(3), 2024.
  79. Representation learning with contrastive predictive coding, 2019.
  80. Barlow twins: Self-supervised learning via redundancy reduction. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12310–12320. PMLR, 18–24 Jul 2021.
  81. Bootstrap your own latent - a new approach to self-supervised learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 21271–21284. Curran Associates, Inc., 2020.
  82. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9726–9735, 2020.
  83. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
  84. Big self-supervised models are strong semi-supervised learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc.
  85. Simcse: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021.
  86. Ronald Inglehart. Human beliefs and values: A cross-cultural sourcebook based on the 1999-2002 values surveys. Siglo XXI, 2004.
  87. Culture, leadership, and organizations: The GLOBE study of 62 societies. Sage publications, 2004.
  88. Geert Hofstede. Dimensionalizing cultures: The hofstede model in context. Online readings in psychology and culture, 2(1):8, 2011.
  89. Shalom H Schwartz. Beyond individualism/collectivism: New cultural dimensions of values. 1994.
  90. Shalom H Schwartz. Mapping and interpreting cultural differences around the world. In Comparing cultures, pages 43–73. Brill, 2004.
  91. Die messung von werten mit dem “portraits value questionnaire”. Zeitschrift für Sozialpsychologie, 38(4):261–275, 2007.
  92. Shalom Schwartz. Cultural value orientations: Nature & implications of national differences. Psychology. Journal of Higher School of Economics, 5(2):37–67, 2008.
  93. Testing the discriminant validity of schwartz’portrait value questionnaire items–a replication and extension of knoppen and saris (2009). In Survey Research Methods, volume 6, pages 25–36, 2012.
  94. Shalom H Schwartz. An overview of the schwartz theory of basic values. Online readings in Psychology and Culture, 2(1):11, 2012.
  95. Milton Rokeach. Some unresolved issues in theories of beliefs, attitudes, and values. In Nebraska symposium on motivation. University of Nebraska Press, 1979.
  96. Milton Rokeach. Understanding human values. Simon and Schuster, 2008.
  97. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022.
  98. Aya model: An instruction finetuned open-access multilingual language model, 2024.
  99. Aya dataset: An open-access collection for multilingual instruction tuning, 2024.
  100. Seallms - large language models for southeast asia. 2023.
  101. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
  102. Solar 10.7b: Scaling large language models with simple yet effective depth up-scaling, 2024.
  103. Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models, 2023.
  104. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In Jong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, and Adila Alfa Krisnadhi, editors, Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 675–718, Nusa Dua, Bali, November 2023. Association for Computational Linguistics.
  105. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations, 2022.
  106. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022.
  107. The flan collection: Designing data and methods for effective instruction tuning, 2023.
  108. InstructSafety: A unified framework for building multidimensional and explainable safety detector through instruction tuning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10421–10436, Singapore, December 2023. Association for Computational Linguistics.
  109. Using in-context learning to improve dialogue safety. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11882–11910, Singapore, December 2023. Association for Computational Linguistics.
  110. Safety-tuned LLaMAs: Lessons from improving the safety of large language models that follow instructions. In The Twelfth International Conference on Learning Representations, 2024.
  111. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  112. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  113. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  114. Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19937–19947, 2024.
  115. Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
  116. Laurens van der Maaten. Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(93):3221–3245, 2014.
  117. David Crystal. English as a global language. Cambridge university press, 2003.
  118. C Tardy. The role of english in scientific communication: lingua franca or tyrannosaurus rex? Journal of English for Academic Purposes, 3(3):247–269, July 2004.
  119. The phenomenon of linguistic globalization: English as the global lingua franca (eglf). Procedia - Social and Behavioral Sciences, 154:509–513, October 2014.
  120. Ana Cristina Suzina. English as lingua franca. or the sterilisation of scientific work. Media, Culture & Society, 43(1):171–179, September 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Samuel Cahyawijaya (75 papers)
  2. Delong Chen (24 papers)
  3. Yejin Bang (25 papers)
  4. Leila Khalatbari (7 papers)
  5. Bryan Wilie (24 papers)
  6. Ziwei Ji (42 papers)
  7. Etsuko Ishii (18 papers)
  8. Pascale Fung (150 papers)
Citations (4)