Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Atomic Self-Consistency for Better Long Form Generations (2405.13131v1)

Published 21 May 2024 in cs.CL

Abstract: Recent work has aimed to improve LLM generations by filtering out hallucinations, thereby improving the precision of the information in responses. Correctness of a long-form response, however, also depends on the recall of multiple pieces of information relevant to the question. In this paper, we introduce Atomic Self-Consistency (ASC), a technique for improving the recall of relevant information in an LLM response. ASC follows recent work, Universal Self-Consistency (USC) in using multiple stochastic samples from an LLM to improve the long-form response. Unlike USC which only focuses on selecting the best single generation, ASC picks authentic subparts from the samples and merges them into a superior composite answer. Through extensive experiments and ablations, we show that merging relevant subparts of multiple samples performs significantly better than picking a single sample. ASC demonstrates significant gains over USC on multiple factoids and open-ended QA datasets - ASQA, QAMPARI, QUEST, ELI5 with ChatGPT and Llama2. Our analysis also reveals untapped potential for enhancing long-form generations using approach of merging multiple samples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Do language models know when they’re hallucinating references? arXiv preprint arXiv:2305.18248.
  2. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
  3. A benchmark dataset of check-worthy factual claims. In Proceedings of the International AAAI Conference on Web and Social Media, volume 14, pages 821–829.
  4. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".
  5. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  6. Universal self-consistency for large language model generation. arXiv preprint arXiv:2311.17311.
  7. Chain-of-verification reduces hallucination in large language models.
  8. Halo: Estimation and reduction of hallucinations in open-source weak large language models. arXiv preprint arXiv:2308.11764.
  9. Eli5: Long form question answering. arXiv preprint arXiv:1907.09190.
  10. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  11. Enabling large language models to generate text with citations. arXiv preprint arXiv:2305.14627.
  12. Evaluating verifiability in generative search engines. arXiv preprint arXiv:2304.09848.
  13. Quest: A retrieval dataset of entity-seeking queries with implicit set operations. arXiv preprint arXiv:2305.11694.
  14. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
  15. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251.
  16. Large dual encoders are generalizable retrievers. arXiv preprint arXiv:2112.07899.
  17. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830.
  18. Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563.
  19. Self-evaluation improves selective generation in large language models. arXiv preprint arXiv:2312.09300.
  20. Qampari:: An open-domain question answering benchmark for questions with many answers from multiple paragraphs. arXiv preprint arXiv:2205.12665.
  21. Natural language to code translation with execution. arXiv preprint arXiv:2204.11454.
  22. Asqa: Factoid questions meet long-form answers. arXiv preprint arXiv:2204.06092.
  23. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  24. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  25. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  26. Self-consistent reasoning for solving math word problems. arXiv preprint arXiv:2210.15373.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Raghuveer Thirukovalluru (7 papers)
  2. Yukun Huang (39 papers)
  3. Bhuwan Dhingra (66 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets