Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mapping and Cleaning Open Commonsense Knowledge Bases with Generative Translation (2306.12766v1)

Published 22 Jun 2023 in cs.CL

Abstract: Structured knowledge bases (KBs) are the backbone of many know-ledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce structure from a text. However, although it allows high recall, the extracted knowledge tends to inherit noise from the sources and the OpenIE algorithm. Besides, OpenIE tuples contain an open-ended, non-canonicalized set of relations, making the extracted knowledge's downstream exploitation harder. In this paper, we study the problem of mapping an open KB into the fixed schema of an existing KB, specifically for the case of commonsense knowledge. We propose approaching the problem by generative translation, i.e., by training a LLM to generate fixed-schema assertions from open ones. Experiments show that this approach occupies a sweet spot between traditional manual, rule-based, or classification-based canonicalization and purely generative KB construction like COMET. Moreover, it produces higher mapping accuracy than the former while avoiding the association-based noise of the latter.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. DBpedia: A Nucleus for a Web of Open Data. In ISWC.
  2. Genericskb: A knowledge base of generic statements. arXiv preprint (2020).
  3. Language models are few-shot learners. neurIPS (2020).
  4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.
  5. Ontology matching: A machine learning approach. In Handbook on ontologies. Springer.
  6. Semantifying Triples from Open Information Extraction Systems. In STAIRS.
  7. T-rex: A large scale alignment of natural language with knowledge base triples. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
  8. Ontology Alignment Evaluation Initiative: Six Years of Experience. Springer Berlin Heidelberg, Berlin, Heidelberg, 158–192. https://doi.org/10.1007/978-3-642-22630-4_6
  9. Ontology matching. Springer.
  10. Identifying relations for open information extraction. In EMNLP.
  11. Scalable multi-hop relational reasoning for knowledge-aware question answering. EMNLP (2020).
  12. Canonicalizing open knowledge bases. In CIKM.
  13. On Aligning OpenIE Extractions with Knowledge Bases: A Case Study. In Eval4NLP.
  14. OPIEC: an open information extraction corpus. AKBC (2019).
  15. (Comet-)Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs. In AAAI.
  16. Openie6: Iterative grid labeling and coordination analysis for open information extraction. arXiv preprint arXiv:2010.03147 (2020).
  17. Fast and exact rule mining with AMIE 3. In ESWC.
  18. KBPearl: a knowledge base population system supported by joint entity and relation linking. VLDB (2020).
  19. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55–60.
  20. Domain-targeted, high precision knowledge extraction. TACL (2017).
  21. Refined Commonsense Knowledge from Large-Scale Web Contents. arXiv (2021).
  22. Advanced semantics for commonsense knowledge extraction. In WWW.
  23. Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering. ACL (2021).
  24. Scikit-learn: Machine learning in Python. JMLR (2011).
  25. Language Models as Knowledge Bases?. In EMNLP.
  26. Aligning OpenIE relations and KB relations using a Siamese network based on word embedding. In IWCS.
  27. Language models are unsupervised multitask learners. OpenAI blog (2019).
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140 (2020), 1–67.
  29. Julien Romero and Simon Razniewski. 2020. Inside Quasimodo: Exploring Construction and Usage of Commonsense Knowledge. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 3445–3448. https://doi.org/10.1145/3340531.3417416
  30. Commonsense Properties from Query Logs and Question Answering Forums. In CIKM.
  31. Open Information Extraction to KBP Relations in 3 Hours. In TAC.
  32. Adapting open information extraction to domain-specific relations. AI magazine (2010).
  33. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In AAAI.
  34. Commonsenseqa: A question answering challenge targeting commonsense knowledge. NAACL (2019).
  35. Yago 4: A reason-able knowledge base. In ESWC.
  36. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
  37. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971 (2023).
  38. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM (2014).
  39. Zero-shot information extraction as a unified text-to-triple translation. arXiv preprint arXiv:2109.11171 (2021).
  40. DEEPSTRUCT: Pretraining of Language Models for Structure Prediction. arXiv preprint arXiv:2205.10475 (2022).
  41. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. neurIPS (2020).
  42. Multi-Label Classification with Label Graph Superimposing. In AAAI.
  43. Integrating Lexical Information into Entity Neighbourhood Representations for Relation Prediction. In NAACL.
  44. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. In NAACL.
  45. A Survey of Knowledge-Enhanced Text Generation. ACM Comput. Surv. (2022).
  46. Openki: Integrating open information extraction and knowledge bases with relation inference. NAACL (2019).
  47. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023).
  48. Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations. In EMNLP. Hong Kong, China.
  49. Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion. In KDD.
  50. A Survey on Neural Open Information Extraction: Current Status and Future Directions. arXiv preprint arXiv:2205.11725 (2022).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Julien Romero (7 papers)
  2. Simon Razniewski (49 papers)
Citations (1)