Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge-Infused Self Attention Transformers (2306.13501v1)

Published 23 Jun 2023 in cs.CL

Abstract: Transformer-based LLMs have achieved impressive success in various natural language processing tasks due to their ability to capture complex dependencies and contextual information using self-attention mechanisms. However, they are not without limitations. These limitations include hallucinations, where they produce incorrect outputs with high confidence, and alignment issues, where they generate unhelpful and unsafe outputs for human users. These limitations stem from the absence of implicit and missing context in the data alone. To address this, researchers have explored augmenting these models with external knowledge from knowledge graphs to provide the necessary additional context. However, the ad-hoc nature of existing methods makes it difficult to properly analyze the effects of knowledge infusion on the many moving parts or components of a transformer. This paper introduces a systematic method for infusing knowledge into different components of a transformer-based model. A modular framework is proposed to identify specific components within the transformer architecture, such as the self-attention mechanism, encoder layers, or the input embedding layer, where knowledge infusion can be applied. Additionally, extensive experiments are conducted on the General Language Understanding Evaluation (GLUE) benchmark tasks, and the findings are reported. This systematic approach aims to facilitate more principled approaches to incorporating knowledge into LLM architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  3. Knowledge-augmented methods for natural language processing. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 1228–1231, 2023.
  4. Evaluating commonsense in pre-trained language models. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 9733–9740, 2020.
  5. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. arXiv preprint arXiv:1902.01007, 2019.
  6. Climbing towards nlu: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 5185–5198, 2020.
  7. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
  8. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  9. Zero-shot word sense disambiguation using sense definition embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5670–5681, 2019.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  11. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  12. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555, 2020.
  13. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
  14. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
  15. Process knowledge-infused learning for clinician-friendly explanations. arXiv preprint arXiv:2306.09824, 2023.
  16. Demo alleviate: Demonstrating artificial intelligence enabled virtual assistance for telehealth: The mental health case. arXiv preprint arXiv:2304.00025, 2023.
  17. ” is depression related to cannabis?”: A knowledge-infused model for entity and relation extraction with limited supervision. arXiv preprint arXiv:2102.01222, 2021.
  18. Knowledge infused policy gradients for adaptive pandemic control. arXiv preprint arXiv:2102.06245, 2021.
  19. Covid-19 in spain and india: comparing policy implications by analyzing epidemiological and social media data. arXiv preprint arXiv:2010.14628, 2020.
  20. Ksat: Knowledge-infused self attention transformer–integrating multiple domain-specific contexts. arXiv preprint arXiv:2210.04307, 2022.
  21. Cook-gen: Robust generative modeling of cooking actions from recipes. arXiv preprint arXiv:2306.01805, 2023.
  22. Knowledge graph guided semantic evaluation of language models for user trust. arXiv preprint arXiv:2305.04989, 2023.
  23. “who can help me?”: Knowledge infused matching of support seekers and support providers during covid-19 on reddit. In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), pages 265–269. IEEE, 2021.
  24. Proknow: Process knowledge for safety constrained and explainable question generation for mental health diagnostic assistance. Frontiers in big Data, 5:1056728, 2023.
  25. Overview of the clpsych 2022 shared task: Capturing moments of change in longitudinal user posts. In Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, pages 184–198, 2022.
  26. Learning to automate follow-up question generation using process knowledge for depression triage on reddit posts. arXiv preprint arXiv:2205.13884, 2022.
  27. Nlp is not enough–contextualization of user input in chatbots. arXiv preprint arXiv:2105.06511, 2021.
  28. Tdlr: Top (semantic)-down (syntactic) language representation. UMBC Faculty Collection, 2022.
  29. edarktrends: Harnessing social media trends in substance use disorders for opioid listings on cryptomarket. arXiv preprint arXiv:2103.15764, 2021.
  30. Ierl: Interpretable ensemble representation learning-combining crowdsourced knowledge and distributed semantic representations. 2023.
  31. Neurosymbolic ai-why, what, and how. arXiv preprint arXiv:2305.00813, 2023.
  32. Knowledge-intensive language understanding for explainable ai. IEEE Internet Computing, 25(5):19–24, 2021.
  33. Process knowledge-infused ai: Toward user-level explainability, interpretability, and safety. IEEE Internet Computing, 26(5):76–84, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kaushik Roy (265 papers)
  2. Yuxin Zi (8 papers)
  3. Vignesh Narayanan (20 papers)
  4. Manas Gaur (59 papers)
  5. Amit Sheth (127 papers)
Citations (6)