Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge Graph Guided Semantic Evaluation of Language Models For User Trust (2305.04989v1)

Published 8 May 2023 in cs.CL and cs.AI

Abstract: A fundamental question in natural language processing is - what kind of language structure and semantics is the LLM capturing? Graph formats such as knowledge graphs are easy to evaluate as they explicitly express language semantics and structure. This study evaluates the semantics encoded in the self-attention transformers by leveraging explicit knowledge graph structures. We propose novel metrics to measure the reconstruction error when providing graph path sequences from a knowledge graph and trying to reproduce/reconstruct the same from the outputs of the self-attention transformer models. The opacity of LLMs has an immense bearing on societal issues of trust and explainable decision outcomes. Our findings suggest that LLMs are models of stochastic control processes for plausible language pattern generation. However, they do not ascribe object and concept-level meaning and semantics to the learned stochastic patterns such as those described in knowledge graphs. Furthermore, to enable robust evaluation of concept understanding by LLMs, we construct and make public an augmented language understanding benchmark built on the General Language Understanding Evaluation (GLUE) benchmark. This has significant application-level user trust implications as stochastic patterns without a strong sense of meaning cannot be trusted in high-stakes applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kaushik Roy (265 papers)
  2. Tarun Garg (2 papers)
  3. Vedant Palit (6 papers)
  4. Yuxin Zi (8 papers)
  5. Vignesh Narayanan (20 papers)
  6. Amit Sheth (127 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.