Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context (2401.12671v3)
Abstract: In the continuously advancing AI landscape, crafting context-rich and meaningful responses via LLMs is essential. Researchers are becoming more aware of the challenges that LLMs with fewer parameters encounter when trying to provide suitable answers to open-ended questions. To address these hurdles, the integration of cutting-edge strategies, augmentation of rich external domain knowledge to LLMs, offers significant improvements. This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement, honing the proficiency of LLMs, especially in domain specific community question answering platforms like AskUbuntu, Unix, and ServerFault. We conduct experiments on various LLMs with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions. Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases. This advancement highlights the importance of pairing context rich data retrieval with LLMs, offering a renewed approach to knowledge sourcing and generation in AI systems. We also show that, due to rich contextual data retrieval, the crucial entities, along with the generated answer, remain factually coherent with the gold answer.
- Information retrieval meets large language models: A strategic report from chinese ir community. AI Open, 4:80–90, 2023.
- Stackllama: An rl fine-tuned llama model for stack exchange question and answering, 2023.
- GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pages 95–136, virtual+Dublin, May 2022. Association for Computational Linguistics.
- Language models are few-shot learners, 2020.
- Answerbot: An answer summary generation tool based on stack overflow. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019, page 1134–1138. Association for Computing Machinery, 2019.
- Exploring the potential of large language models (llms) in learning on graphs, 2023.
- Answer summarization for technical queries: Benchmark and new approach, 2022.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
- Joint learning of answer selection and answer summary generation in community question answering, 2019.
- Prompting and evaluating large language models for proactive dialogues: Clarification, target-guided, and non-collaboration, 2023.
- Chain-of-verification reduces hallucination in large language models, 2023.
- Attentive pooling networks. ArXiv, abs/1602.03609, 2016.
- Michael R. Douglas. Large language models, 2023.
- Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res., 22(1):457–479, dec 2004.
- Technology innovation institute tii. falcon llm, 2023.
- Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd international conference on computational linguistics, pages 340–348. Association for Computational Linguistics, 2010.
- Scaling laws for reward model overoptimization, 2022.
- Realm: Retrieval-augmented language model pre-training, 2020.
- Realm: Retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
- Joint autoregressive and graph models for software and developer social networks. In Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani, editors, Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part I, volume 12656 of Lecture Notes in Computer Science, pages 224–237. Springer, 2021.
- Is this bug severe? a¬†text-cum-graph based model for¬†bug severity prediction. In Massih-Reza Amini, Stéphane Canu, Asja Fischer, Tias Guns, Petra Kralj Novak, and Grigorios Tsoumakas, editors, Machine Learning and Knowledge Discovery in Databases, pages 236–252, Cham, 2023. Springer Nature Switzerland.
- CTRLsum: Towards generic controllable text summarization. pages 5879–5915, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
- Hoon Heo. Factsumm: Factual consistency scorer for abstractive summarization. https://github.com/Huffon/factsumm, 2021.
- Answer generation for retrieval-based question answering systems. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4276–4282, Online, August 2021. Association for Computational Linguistics.
- Are llms all you need for task-oriented dialogue?, 2023.
- REBEL: Relation extraction by end-to-end language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2370–2381, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- Evaluating open-domain question answering in the era of large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5591–5606, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Knowledge graph-augmented language models for knowledge-grounded dialogue generation, 2023.
- Scaling laws for neural language models, 2020.
- Biased TextRank: Unsupervised graph-based content extraction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1642–1652, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics.
- C. S. Krishna. Prompt generate train (pgt): Few-shot domain adaption of retrieval augmented generation models for open book question-answering, 2023.
- Prompted llms as chatbot modules for long open-domain conversation, 2023.
- Rlaif: Scaling reinforcement learning from human feedback with ai feedback, 2023.
- Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021.
- Reliable medical diagnosis from crowdsourcing: Discover trustworthy answers from non-experts. WSDM ’17, page 253–261, New York, NY, USA, 2017. Association for Computing Machinery.
- Knowledge-grounded dialogue generation with a unified knowledge representation, 2022.
- Textbooks are all you need ii: phi-1.5 technical report, 2023.
- Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models, 2023.
- Co2sum: Contrastive learning for factual-consistent abstractive summarization. ArXiv, abs/2112.01147, 2021.
- Structured knowledge grounding for question answering, 2023.
- Retrieval augmented generation and representative vector summarization for large unstructured textual data in medical education, 2023.
- Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
- Sources of hallucination by large language models on inference tasks, 2023.
- TextRank: Bringing order into text. pages 404–411, Barcelona, Spain, July 2004. Association for Computational Linguistics.
- Self-contradictory hallucinations of large language models: Evaluation, detection and mitigation, 2023.
- Essential sentences for navigating stack overflow answers, 2019.
- Leveraging unsupervised learning to summarize APIs discussed in stack overflow. IEEE, sep 2021.
- A comprehensive overview of large language models, 2023.
- Training language models to follow instructions with human feedback, 2022.
- Webbrain: Learning to generate factually correct articles for queries by grounding on large web corpus, 2023.
- In-context retrieval-augmented language models, 2023.
- Experiments with convolutional neural network models for answer selection. SIGIR ’17, page 1217–1220, New York, NY, USA, 2017. Association for Computing Machinery.
- Sentence-bert: Sentence embeddings using siamese bert-networks, 2019.
- How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online, November 2020. Association for Computational Linguistics.
- Scaling up models and data with t5x and seqio. arXiv preprint arXiv:2203.17189, 2022.
- Leveraging large language models for multiple choice question answering, 2023.
- Pdftriage: Question answering over long, structured documents, 2023.
- Learning to rank short text pairs with convolutional deep neural networks. SIGIR ’15, page 373–382, New York, NY, USA, 2015. Association for Computing Machinery.
- Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy, 2023.
- Knowledge-aware attentive neural network for ranking question answer pairs. SIGIR ’18, page 901–904, New York, NY, USA, 2018. Association for Computing Machinery.
- Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- Context-aware language modeling for goal-oriented dialogue systems. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2351–2366, Seattle, United States, July 2022. Association for Computational Linguistics.
- Learning to summarize with human feedback. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 3008–3021. Curran Associates, Inc., 2020.
- Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 464–473, Berlin, Germany, August 2016. Association for Computational Linguistics.
- MosaicML NLP Team. Introducing mpt-7b: A new standard for open-source, commercially usable llms, 2023. Accessed: 2023-05-05.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- Automatic summarization of api reviews. ASE ’17, page 159–170. IEEE Press, 2017.
- Prompting for a conversation: How to control a dialog model?, 2022.
- Trl: Transformer reinforcement learning. https://github.com/huggingface/trl, 2020.
- Probabilistic tree-edit models with structured latent variables for textual entailment and question answering. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 1164–1172, Beijing, China, August 2010. Coling 2010 Organizing Committee.
- Di Wang and Eric Nyberg. A long short-term memory model for answer sentence selection in question answering. pages 707–712, Beijing, China, July 2015. Association for Computational Linguistics.
- A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, page 187–194, New York, NY, USA, 2009. Association for Computing Machinery.
- Personalized pagerank to a target node, revisited. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, page 657–667, New York, NY, USA, 2020. Association for Computing Machinery.
- Knowledge graph prompting for multi-document question answering, 2023.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data. arXiv e-prints, page arXiv:2301.05843, January 2023.
- Hybrid attentive answer selection in cqa with deep users modelling. AAAI’18/IAAI’18/EAAI’18. AAAI Press, 2018.
- Question condensing networks for answer selection in community question answering. pages 1746–1755, Melbourne, Australia, July 2018. Association for Computational Linguistics.
- C-pack: Packaged resources to advance general chinese embedding, 2023.
- Enhancing paraphrase question generation with prior knowledge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1464–1475, 2023.
- Coarse-to-fine query focused multi-document summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3632–3645, Online, November 2020. Association for Computational Linguistics.
- Answerbot: Automated generation of answer summary to developers’ technical questions. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 706–716, 2017.
- Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt, 2023.
- Techsumbot: A stack overflow answer summarization tool for technical query. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pages 132–135, 2023.
- Answer summarization for technical queries: Benchmark and new approach. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ASE ’22, New York, NY, USA, 2023. Association for Computing Machinery.
- Coupling large language models with logic programming for robust and general reasoning from text. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5186–5219, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Learning to rank question-answer pairs using hierarchical recurrent encoder with latent topic clustering. pages 1575–1584, June 2018.
- Augmentation-adapted retriever improves generalization of language models as generic plug-in, 2023.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2020.
- Bertscore: Evaluating text generation with bert, 2020.
- A graph-based relevance matching model for ad-hoc retrieval, 2021.
- Instruction tuning for large language models: A survey, 2023.
- Universalner: Targeted distillation from large language models for open named entity recognition. 2023.
- Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
- Somnath Banerjee (22 papers)
- Amruit Sahoo (3 papers)
- Sayan Layek (11 papers)
- Avik Dutta (3 papers)
- Rima Hazra (21 papers)
- Animesh Mukherjee (154 papers)