Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models (2305.05050v3)

Published 8 May 2023 in cs.CL and cs.AI

Abstract: Over the past decade, analogies, in the form of word-level analogies, have played a significant role as an intrinsic measure of evaluating the quality of word embedding methods such as word2vec. Modern LLMs, however, are primarily evaluated on extrinsic measures based on benchmarks such as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs can draw analogies between long texts. In this paper, we present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text with six levels of complexity -- (i) word, (ii) word vs. sentence, (iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using thirteen datasets and three different distance measures, we evaluate the abilities of eight LLMs in identifying analogical pairs in the semantic vector space. Our evaluation finds that it is increasingly challenging for LLMs to identify analogies when going up the analogy taxonomy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Thilini Wijesiriwardene (7 papers)
  2. Ruwan Wickramarachchi (12 papers)
  3. Bimal G. Gajera (1 paper)
  4. Shreeyash Mukul Gowaikar (1 paper)
  5. Chandan Gupta (6 papers)
  6. Aman Chadha (110 papers)
  7. Aishwarya Naresh Reganti (4 papers)
  8. Amit Sheth (127 papers)
  9. Amitava Das (45 papers)
Citations (10)