Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust (2305.07230v2)

Published 12 May 2023 in cs.CL

Abstract: LLMs have significantly advanced the field of natural language processing, with GPT models at the forefront. While their remarkable performance spans a range of tasks, adapting LLMs for real-world business scenarios still poses challenges warranting further investigation. This paper presents an empirical analysis aimed at bridging the gap in adapting LLMs to practical use cases. To do that, we select the question answering (QA) task of insurance as a case study due to its challenge of reasoning. Based on the task we design a new model relied on LLMs which are empowered by additional knowledge extracted from insurance policy rulebooks and DBpedia. The additional knowledge helps LLMs to understand new concepts of insurance for domain adaptation. Preliminary results on two QA datasets show that knowledge enhancement significantly improves the reasoning ability of GPT-3.5 (55.80% and 57.83% in terms of accuracy). The analysis also indicates that existing public knowledge bases, e.g., DBPedia is beneficial for knowledge enhancement. Our findings reveal that the inherent complexity of business scenarios often necessitates the incorporation of domain-specific knowledge and external resources for effective problem-solving.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Towards a mathematics formalisation assistant using large language models, 2022.
  2. Xavier Amatriain. Transformer models: an introduction and catalog. In arXiv preprint arXiv:2302.07730, 2023.
  3. Large language models and the perils of their hallucinations. Critical Care, 27(1):1–2, 2023.
  4. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, 2015.
  5. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. In arXiv preprint arXiv:2302.04023, 2023.
  6. Unilmv2: Pseudo-masked language models for unified language model pre-training. In International Conference on Machine Learning, pp. 642–652. PMLR, 2020.
  7. A survey on machine reading comprehension systems. Natural Language Engineering, 28(6):683–732, 2022.
  8. Aon Benfield. Global insurance market opportunities. Risk. Reinsurance. Human Resources, (1):63, 2015.
  9. A neural probabilistic language model. Advances in neural information processing systems, 13, 2000.
  10. Samuel R Bowman. Eight things to know about large language models. arXiv preprint arXiv:2304.00612, 2023.
  11. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–480, 1992.
  12. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  13. A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt, 2023.
  14. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  1724–1734, 2014.
  15. Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp.  93–98, 2016.
  16. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  17. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186, 2019.
  18. All nlp tasks are generation tasks: A general pretraining framework. In arXiv preprint arXiv:2103.10360, 2021.
  19. Chat-rec: Towards interactive and explainable llms-augmented recommender system, 2023.
  20. Chatgpt outperforms crowd-workers for text-annotation tasks, 2023.
  21. Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
  22. Don’t stop pretraining: Adapt language models to domains and tasks. In In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8342–8360, 2020.
  23. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  24. An empirical analysis of compute-optimal large language model training. Advances in Neural Information Processing Systems, 35:30016–30030, 2022.
  25. IBM. Corporate governed natural language generation using gpt based llm for watson assistant, 2023. URL https://community.ibm.com/community/user/ai-datascience/blogs/rachana-vishwanathula/2023/03/04/gpt-based-nlg-with-corporateknowledgebase.
  26. Enhance incomplete utterance restoration by joint learning token extraction and text generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  3149–3158, 2022.
  27. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  28. A survey on neural network language models. arXiv preprint arXiv:1906.03591, 2019.
  29. Chatgpt for programming numerical methods, 2023.
  30. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
  31. Zero-data learning of new tasks. In AAAI, volume 1, pp.  3, 2008.
  32. Independence properties of directed markov fields. Networks, 20(5):491–505, 1990.
  33. Large language models can be strong differentially private learners. In International Conference on Learning Representations, 2022.
  34. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge, 2023.
  35. Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs, 1, 2021.
  36. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.  74–81, 2004.
  37. What makes good in-context examples for gpt-3. In arXiv preprint arXiv:2101.06804, 2021.
  38. Generated knowledge prompting for commonsense reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  3154–3169, 2022.
  39. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), pp.1-35, 2023.
  40. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  41. Bounding the capabilities of large language models in open text generation with prompt constraints, 2023.
  42. Kevin Menear. Unleashing the potential of large language models in the classroom: A comprehensive guide for teachers. Medium, 2023.
  43. Recurrent neural network based language model. In Interspeech, volume 2, pp.  1045–1048. Makuhari, 2010.
  44. Niklas Muennighoff. Sgpt: Gpt sentence embeddings for semantic search, 2022.
  45. Make the most of prior data: A solution for interactive text summarization with preference feedback. In Findings of the Association for Computational Linguistics: NAACL 2022, pp.  1919–1930, 2022.
  46. Transfer learning for information extraction with limited data. In Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers 16, pp.  469–482. Springer, 2020.
  47. Transformers-based information extraction with limited data for domain-specific business documents. Engineering Applications of Artificial Intelligence, 97:104100, 2021.
  48. Gain more with less: Extracting information from business documents with small data. Expert Systems with Applications, 215:119274, 2023.
  49. Intellibot: A dialogue-based chatbot for the insurance industry. Knowledge-Based Systems, 196:105810, 2020.
  50. OpenAI. Gpt-4 technical report. arXiv preprint 2303.08774, 2023b. URL https://doi.org/10.48550/arX iv.2303.08774, 2023.
  51. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  52. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
  53. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  311–318, 2002.
  54. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.  2249–2255, 2016.
  55. Faster and smaller n-gram language models. In Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies, pp.  258–267, 2011.
  56. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813, 2023.
  57. Improving language understanding by generative pre-training. 2018.
  58. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446, 2021.
  59. Exploring the limits of transfer learning with a unified text-to-text transformer. ArXiv, abs/1910.10683, 2020.
  60. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.  2383–2392, 2016.
  61. Querying large language models with sql, 2023.
  62. Jarvis: Large-scale server monitoring with adaptive near-data processing. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp.  1408–1422, 2022. doi: 10.1109/ICDE53745.2022.00110.
  63. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2339-2352, 2021.
  64. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv preprint arXiv:2201.11990, 2022.
  65. Lstm neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, 2012.
  66. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
  67. Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503, 2021.
  68. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  69. Attention is all you need. In Advances in Neural Information Processing Systems, pp. 6000–6010, 2017.
  70. Pointer networks. Advances in neural information processing systems, 28, 2015.
  71. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32, 2019.
  72. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  73. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
  74. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  75. Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark, 2023.
  76. Zero-shot learning-the good, the bad and the ugly. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4582–4591, 2017.
  77. Ernie-gen: An enhanced multi-flow pre-training and fine-tuning framework for natural language generation. arXiv preprint arXiv:2001.11314, 2020.
  78. Differentially private fine-tuning of language models. In International Conference on Learning Representations, 2022.
  79. What makes good examples for visual in-context learning? arXiv preprint arXiv:2301.13670, 2023.
  80. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
  81. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Minh-Tien Nguyen (19 papers)
  2. Duy-Hung Nguyen (6 papers)
  3. Shahab Sabahi (3 papers)
  4. Hung Le (120 papers)
  5. Jeff Yang (3 papers)
  6. Hajime Hotta (3 papers)
Citations (1)