Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation (2402.01495v1)

Published 2 Feb 2024 in cs.CL

Abstract: Generating natural language text from graph-structured data is essential for conversational information seeking. Semantic triples derived from knowledge graphs can serve as a valuable source for grounding responses from conversational agents by providing a factual basis for the information they communicate. This is especially relevant in the context of LLMs, which offer great potential for conversational interaction but are prone to hallucinating, omitting, or producing conflicting information. In this study, we conduct an empirical analysis of conversational LLMs in generating natural language text from semantic triples. We compare four LLMs of varying sizes with different prompting techniques. Through a series of benchmark experiments on the WebNLG dataset, we analyze the models' performance and identify the most common issues in the generated predictions. Our findings show that the capabilities of LLMs in triple verbalization can be significantly improved through few-shot prompting, post-processing, and efficient fine-tuning techniques, particularly for smaller models that exhibit lower zero-shot performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Analysing mixed initiatives and search strategies during conversational search. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM ’21, page 16–26, New York, NY, USA. Association for Computing Machinery.
  2. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
  4. The 2020 bilingual, bi-directional WebNLG+ shared task: Overview and evaluation results (WebNLG+ 2020). In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pages 55–76, Dublin, Ireland (Virtual). Association for Computational Linguistics.
  5. Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 552–562, Hong Kong, China. Association for Computational Linguistics.
  6. Few-shot NLG with pre-trained language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 183–190, Online. Association for Computational Linguistics.
  7. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. LMSYS Org Blog.
  8. Control prefixes for parameter-efficient text generation. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 363–382, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  9. The WebNLG challenge: Generating text from DBPedia data. In Proceedings of the 9th International Natural Language Generation conference, pages 163–167, Edinburgh, UK. Association for Computational Linguistics.
  10. Is GPT-3 text indistinguishable from human text? scarecrow: A framework for scrutinizing machine text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7250–7274, Dublin, Ireland. Association for Computational Linguistics.
  11. Pive: Prompting with iterative verification improving graph-based generative capability of llms. arXiv:2305.12392.
  12. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  13. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12).
  14. Zdeněk Kasner and Ondrej Dusek. 2022. Neural pipeline for zero-shot data-to-text generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3914–3932, Dublin, Ireland. Association for Computational Linguistics.
  15. Mind the labels: Describing relations in knowledge graphs with pretrained models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2398–2415, Dubrovnik, Croatia. Association for Computational Linguistics.
  16. Pretrained language models for text generation: A survey. In International Joint Conference on Artificial Intelligence.
  17. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  18. Awakening latent grounding from pretrained language models for semantic parsing. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1174–1189, Online. Association for Computational Linguistics.
  19. Step-by-step: Separating planning from realization in neural data-to-text generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2267–2277, Minneapolis, Minnesota. Association for Computational Linguistics.
  20. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. OpenAI.
  21. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  22. Language models are unsupervised multitask learners. OpenAI.
  23. Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, CHIIR ’17, page 117–126, New York, NY, USA. Association for Computing Machinery.
  24. An unsupervised joint system for text generation from knowledge graphs and semantic parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7117–7130, Online. Association for Computational Linguistics.
  25. Evaluating large language models in semantic parsing for conversational question answering over knowledge graphs. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence, Rome, Italy. SCITEPRESS.
  26. A decade of knowledge graphs in natural language processing: A survey. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 601–614, Online only. Association for Computational Linguistics.
  27. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 223–231, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
  28. Plan-then-generate: Controlled data-to-text generation via planning. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 895–909, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  29. Lamda: Language models for dialog applications. arXiv:2201.08239.
  30. Llama: Open and efficient foundation language models. arXiv:2302.13971.
  31. Stage-wise fine-tuning for graph-to-text generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 16–22, Online. Association for Computational Linguistics.
  32. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets