Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

COMBO: A Complete Benchmark for Open KG Canonicalization (2302.03905v1)

Published 8 Feb 2023 in cs.CL and cs.AI

Abstract: Open knowledge graph (KG) consists of (subject, relation, object) triples extracted from millions of raw text. The subject and object noun phrases and the relation in open KG have severe redundancy and ambiguity and need to be canonicalized. Existing datasets for open KG canonicalization only provide gold entity-level canonicalization for noun phrases. In this paper, we present COMBO, a Complete Benchmark for Open KG canonicalization. Compared with existing datasets, we additionally provide gold canonicalization for relation phrases, gold ontology-level canonicalization for noun phrases, as well as source sentences from which triples are extracted. We also propose metrics for evaluating each type of canonicalization. On the COMBO dataset, we empirically compare previously proposed canonicalization methods as well as a few simple baseline methods based on pretrained LLMs. We find that properly encoding the phrases in a triple using pretrained LLMs results in better relation canonicalization and ontology-level canonicalization of the noun phrase. We release our dataset, baselines, and evaluation scripts at https://github.com/jeffchy/COMBO/tree/main.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chengyue Jiang (11 papers)
  2. Yong Jiang (194 papers)
  3. Weiqi Wu (11 papers)
  4. Yuting Zheng (4 papers)
  5. Pengjun Xie (85 papers)
  6. Kewei Tu (74 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.