Atomic Self-Consistency for Better Long Form Generations (2405.13131v1)
Abstract: Recent work has aimed to improve LLM generations by filtering out hallucinations, thereby improving the precision of the information in responses. Correctness of a long-form response, however, also depends on the recall of multiple pieces of information relevant to the question. In this paper, we introduce Atomic Self-Consistency (ASC), a technique for improving the recall of relevant information in an LLM response. ASC follows recent work, Universal Self-Consistency (USC) in using multiple stochastic samples from an LLM to improve the long-form response. Unlike USC which only focuses on selecting the best single generation, ASC picks authentic subparts from the samples and merges them into a superior composite answer. Through extensive experiments and ablations, we show that merging relevant subparts of multiple samples performs significantly better than picking a single sample. ASC demonstrates significant gains over USC on multiple factoids and open-ended QA datasets - ASQA, QAMPARI, QUEST, ELI5 with ChatGPT and Llama2. Our analysis also reveals untapped potential for enhancing long-form generations using approach of merging multiple samples.
- Do language models know when they’re hallucinating references? arXiv preprint arXiv:2305.18248.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
- A benchmark dataset of check-worthy factual claims. In Proceedings of the International AAAI Conference on Web and Social Media, volume 14, pages 821–829.
- Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
- Universal self-consistency for large language model generation. arXiv preprint arXiv:2311.17311.
- Chain-of-verification reduces hallucination in large language models.
- Halo: Estimation and reduction of hallucinations in open-source weak large language models. arXiv preprint arXiv:2308.11764.
- Eli5: Long form question answering. arXiv preprint arXiv:1907.09190.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
- Enabling large language models to generate text with citations. arXiv preprint arXiv:2305.14627.
- Evaluating verifiability in generative search engines. arXiv preprint arXiv:2304.09848.
- Quest: A retrieval dataset of entity-seeking queries with implicit set operations. arXiv preprint arXiv:2305.11694.
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
- Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251.
- Large dual encoders are generalizable retrievers. arXiv preprint arXiv:2112.07899.
- Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830.
- Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563.
- Self-evaluation improves selective generation in large language models. arXiv preprint arXiv:2312.09300.
- Qampari:: An open-domain question answering benchmark for questions with many answers from multiple paragraphs. arXiv preprint arXiv:2205.12665.
- Natural language to code translation with execution. arXiv preprint arXiv:2204.11454.
- Asqa: Factoid questions meet long-form answers. arXiv preprint arXiv:2204.06092.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Self-consistent reasoning for solving math word problems. arXiv preprint arXiv:2210.15373.
- Raghuveer Thirukovalluru (7 papers)
- Yukun Huang (39 papers)
- Bhuwan Dhingra (66 papers)