Measuring Moral Inconsistencies in Large Language Models (2402.01719v3)
Abstract: A LLM is considered consistent if semantically equivalent prompts produce semantically equivalent responses. Despite recent advancements showcasing the impressive capabilities of LLMs in conversational systems, we show that even state-of-the-art LLMs are highly inconsistent in their generations, questioning their reliability. Prior research has tried to measure this with task-specific accuracy. However, this approach is unsuitable for moral scenarios, such as the trolley problem, with no "correct" answer. To address this issue, we propose a novel information-theoretic measure called Semantic Graph Entropy (SGE) to measure the consistency of an LLM in moral scenarios. We leverage "Rules of Thumb" (RoTs) to explain a model's decision-making strategies and further enhance our metric. Compared to existing consistency metrics, SGE correlates better with human judgments across five LLMs. In the future, we aim to investigate the root causes of LLM inconsistencies and propose improvements.
- Lm-cppf: Paraphrasing-guided data augmentation for contrastive prompt-based few-shot fine-tuning. arXiv preprint arXiv:2305.18169.
- Olga Abramov and Tatiana Lokot. 2011. Typology by means of language networks: Applying information theoretic measures to morphological derivation networks. Towards an Information Theory of Complex Networks: Statistical Methods and Applications, pages 321–346.
- Towards effective paraphrasing for information disguise. In European Conference on Information Retrieval, pages 331–340. Springer.
- Falcon-40B: an open large language model with state-of-the-art performance.
- A large annotated corpus for learning natural language inference.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Measuring and Improving Consistency in Pretrained Language Models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
- Evaluating superhuman models with consistency checks.
- Unsupervised quality estimation for neural machine translation.
- Social chemistry 101: Learning to reason about social and moral norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 653–670, Online. Association for Computational Linguistics.
- A framework for few-shot language model evaluation.
- Deberta: Decoding-enhanced bert with disentangled attention.
- BECEL: Benchmark for consistency evaluation of language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3680–3696, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Myeongjun Jang and Thomas Lukasiewicz. 2023. Consistency analysis of chatgpt.
- Masahiro Kaneko and Naoaki Okazaki. 2023. Reducing sequence length by predicting edit operations with large language models. arXiv preprint arXiv:2305.11862.
- Prosocialdialog: A prosocial backbone for conversational agents.
- Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation.
- Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.
- On the evaluation metrics for paraphrase generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3178–3190, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Not all metrics are guilty: Improving nlg evaluation with llm paraphrasing. arXiv preprint arXiv:2305.15067.
- Llama 2: Open foundation and fine-tuned chat models.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
- Jianing Zhou and Suma Bhat. 2021. Paraphrase generation: A survey of the state of the art. In Proceedings of the 2021 conference on empirical methods in natural language processing, pages 5075–5086.
- The moral integrity corpus: A benchmark for ethical dialogue systems.
- Vamshi Krishna Bonagiri (6 papers)
- Sreeram Vennam (7 papers)
- Manas Gaur (59 papers)
- Ponnurangam Kumaraguru (129 papers)