Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Alleviating Hallucinations of Large Language Models through Induced Hallucinations (2312.15710v2)

Published 25 Dec 2023 in cs.CL and cs.AI

Abstract: Despite their impressive capabilities, LLMs have been observed to generate responses that include inaccurate or fabricated information, a phenomenon commonly known as ``hallucination''. In this work, we propose a simple \textit{Induce-then-Contrast} Decoding (ICD) strategy to alleviate hallucinations. We first construct a factually weak LLM by inducing hallucinations from the original LLMs. Then, we penalize these induced hallucinations during decoding to enhance the factuality of the generated content. Concretely, we determine the final next-token predictions by amplifying the predictions from the original model and downplaying the induced untruthful predictions via contrastive decoding. Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families. For example, when equipped with ICD, Llama2-7B-Chat and Mistral-7B-Instruct achieve performance comparable to ChatGPT and GPT4 on TruthfulQA, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
  2. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  3. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision.
  4. Medusa: Simple framework for accelerating llm generation with multiple decoding heads. https://github.com/FasterDecoding/Medusa.
  5. Unveiling the siren’s song: Towards reliable fact-conflicting hallucination detection. arXiv preprint arXiv:2310.12086.
  6. Evaluating hallucinations in chinese large language models. arXiv preprint arXiv:2310.03368.
  7. Factool: Factuality detection in generative ai–a tool augmented framework for multi-task and multi-domain scenarios. arXiv preprint arXiv:2307.13528.
  8. Dola: Decoding by contrasting layers improves factuality in large language models. arXiv preprint arXiv:2309.03883.
  9. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  10. OpenCompass Contributors. 2023. Opencompass: A universal evaluation platform for foundation models. https://github.com/open-compass/opencompass.
  11. Detecting and mitigating hallucinations in machine translation: Model internal workings alone do well, sentence similarity even better. arXiv preprint arXiv:2212.08597.
  12. Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation. arXiv preprint arXiv:2304.01746.
  13. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  14. Do large language models know about facts? arXiv preprint arXiv:2310.05177.
  15. Baseline defenses for adversarial attacks against aligned language models. arXiv preprint arXiv:2309.00614.
  16. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
  17. Mistral 7b. arXiv preprint arXiv:2310.06825.
  18. Is chatgpt a good translator? a preliminary study. arXiv preprint arXiv:2301.08745.
  19. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  20. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110.
  21. Halueval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464.
  22. Inference-time intervention: Eliciting truthful answers from a language model. arXiv preprint arXiv:2306.03341.
  23. Contrastive decoding: Open-ended text generation as optimization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12286–12312.
  24. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463.
  25. Let’s verify step by step. arXiv preprint arXiv:2305.20050.
  26. Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252.
  27. DExperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6691–6706.
  28. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
  29. Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552.
  30. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251.
  31. Sean O’Brien and Mike Lewis. 2023. Contrastive decoding improves reasoning in large language models. arXiv preprint arXiv:2309.09117.
  32. OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  33. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  34. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
  35. Red teaming language models with language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3419–3448.
  36. Fine-tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693.
  37. Hallucination reduction in long input text summarization. arXiv preprint arXiv:2309.16781.
  38. Delucionqa: Detecting hallucinations in domain-specific question answering. arXiv preprint arXiv:2312.05200.
  39. John Schulman. 2023. Reinforcement learning from human feedback: Progress and challenges.
  40. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pages 31210–31227. PMLR.
  41. Trusting your evidence: Hallucinate less with context-aware decoding. arXiv preprint arXiv:2305.14739.
  42. Aligning large multimodal models with factually augmented rlhf. arXiv preprint arXiv:2309.14525.
  43. Ilya Sutskever. 2023. An obervation on generalization.
  44. Bert rediscovers the classical nlp pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593–4601.
  45. Fine-tuning language models for factuality. arXiv preprint arXiv:2311.08401.
  46. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  47. Med-halt: Medical domain hallucination test for large language models. arXiv preprint arXiv:2307.15343.
  48. Freshllms: Refreshing large language models with search engine augmentation. arXiv preprint arXiv:2310.03214.
  49. Histalign: Improving context dependency in language generation by aligning with history. arXiv preprint arXiv:2305.04782.
  50. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. arXiv preprint arXiv:2310.07521.
  51. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483.
  52. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
  53. Defending chatgpt against jailbreak attack via self-reminder.
  54. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864.
  55. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
  56. Rlcd: Reinforcement learning from contrast distillation for language model alignment. arXiv preprint arXiv:2307.12950.
  57. Alignment for honesty. arXiv preprint arXiv:2312.07000.
  58. Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469.
  59. Surfacing biases in large language models using contrastive input decoding. arXiv preprint arXiv:2305.07378.
  60. Automatic hallucination assessment for aligned large language models via transferable adversarial attacks. arXiv preprint arXiv:2310.12516.
  61. Sac 33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT: Reliable hallucination detection in black-box language models via semantic-aware cross-check consistency. arXiv preprint arXiv:2311.01740.
  62. Multi-task instruction tuning of llama for specific scenarios: A preliminary study on writing assistance. arXiv preprint arXiv:2305.13225.
  63. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
  64. A survey of large language models. arXiv preprint arXiv:2303.18223.
  65. Why does chatgpt fall short in providing truthful answers. arXiv preprint, arXiv:2304.10513.
  66. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
  67. Mixture-of-experts with expert choice routing. Advances in Neural Information Processing Systems, 35:7103–7114.
  68. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yue Zhang (618 papers)
  2. Leyang Cui (50 papers)
  3. Wei Bi (62 papers)
  4. Shuming Shi (126 papers)
Citations (41)
Youtube Logo Streamline Icon: https://streamlinehq.com