Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agent-OM: Leveraging LLM Agents for Ontology Matching

Published 1 Dec 2023 in cs.AI, cs.CL, and cs.IR | (2312.00326v10)

Abstract: Ontology matching (OM) enables semantic interoperability between different ontologies and resolves their conceptual heterogeneity by aligning related entities. OM systems currently have two prevailing design paradigms: conventional knowledge-based expert systems and newer machine learning-based predictive systems. While LLMs and LLM agents have revolutionised data engineering and have been applied creatively in many domains, their potential for OM remains underexplored. This study introduces a novel agent-powered LLM-based design paradigm for OM systems. With consideration of several specific challenges in leveraging LLM agents for OM, we propose a generic framework, namely Agent-OM (Agent for Ontology Matching), consisting of two Siamese agents for retrieval and matching, with a set of OM tools. Our framework is implemented in a proof-of-concept system. Evaluations of three Ontology Alignment Evaluation Initiative (OAEI) tracks over state-of-the-art OM systems show that our system can achieve results very close to the long-standing best performance on simple OM tasks and can significantly improve the performance on complex and few-shot OM tasks.

Citations (5)

Summary

No one has generated a summary of this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a consolidated list of what remains missing, uncertain, or unexplored in the paper, phrased to guide actionable follow-up research.

  • Lack of support for complex OWL axioms: the framework only verbalizes triple-based relations; integration of axiom-level verbalization (e.g., class expressions, restrictions) and its impact on matching performance remains unstudied.
  • Relation types beyond equivalence are not handled: the system focuses on equivalence; subsumption, part-of, disjointness, and other alignment relations are not modeled, detected, or evaluated.
  • No global alignment coherence/repair: there is no logical consistency checking or repair of alignments using OWL reasoning; effects of adding repair on precision/recall are unknown.
  • Individuals and property alignment coverage unclear: handling of individuals, object vs datatype properties, and property reification is not detailed; coverage and accuracy on these element types are unassessed.
  • Mapping cardinality constraints not addressed: one-to-one vs many-to-many alignment strategies and conflict resolution are absent; current “intersection-only” merging may depress recall without analysis.
  • Dependence on prompt-based tools: all tools are prompt-driven; the benefits and drawbacks of replacing them with programmatic/algorithmic extractors (e.g., robust RDF parsers, reasoners, rule-based modules) are not evaluated.
  • LLM-based “graphical” verbalization quality unvalidated: the faithfulness and consistency of natural language verbalizations of triples (and their effect on downstream matching) are not measured.
  • Naming convention normalization assumptions: the approach relies on high-quality rdfs:label/comment to replace codes/URIs; behavior when labels are missing, noisy, ambiguous, or multilingual is not addressed.
  • Automatic context determination: the system relies on a manually specified context (e.g., “conference”); how to infer or learn context automatically from ontologies or metadata is not explored.
  • Sensitivity to hyperparameters is untested: similarity threshold (e.g., 0.8), top-k (e.g., 5), and RRF settings (e.g., choice of k) lack justification and sensitivity/robustness analysis.
  • Embedding/model dependence: only OpenAI embeddings (1536-d) and GPT-3.5 are used; comparisons with alternative embedding models (e.g., SBERT, E5, VertexAI), cross-encoders, and LLMs (GPT-4, Llama, Claude) are not reported.
  • No ablation studies: contributions of planning (CoT), shared memory, individual matchers (initial/lexical/graphical), RRF fusion, and validator are not disentangled via ablations.
  • Scalability and efficiency unquantified: latency, throughput, and memory footprint for large ontologies (e.g., with millions of entities) are not measured; no use or evaluation of ANN indexes (e.g., HNSW) for vector search.
  • Cost analysis absent: API call counts, token usage, embedding costs, and overall monetary cost per alignment are not documented; cost–accuracy trade-offs remain unclear.
  • Reliability of the LLM validator is unknown: the yes/no equivalence check lacks calibration, confidence estimation, majority voting, or entailment-based verification; false positive/negative rates are not analyzed.
  • Handling of ontology evolution: incremental updates, re-embedding strategies, and memory/database maintenance for changing ontologies are not addressed.
  • Robustness to sparse or noisy ontologies: performance when ontologies lack descriptive labels/comments or contain inconsistent annotations is not evaluated.
  • Generalization beyond three OAEI tracks: domain transferability (e.g., biomedical, geospatial, industrial ontologies), and performance on larger or more heterogeneous benchmarks are not demonstrated.
  • Cross-lingual OM not considered: matching across different natural languages and integration of multilingual embeddings/translation pipelines remain open issues.
  • Explainability and user trust: how to present match rationales, enable human verification, or support interactive correction is not discussed.
  • Privacy and compliance: reliance on proprietary APIs (OpenAI for LLMs and embeddings) raises data governance questions; on-prem or open-source alternatives and their performance are not studied.
  • Reproducibility and artifacts: complete prompt templates, agent configurations, and code/data release are unclear (artifact URL placeholder); reproducibility across LLM versions/providers is not assessed.
  • Parameterized candidate generation strategy: the impact of different candidate retrieval strategies (e.g., union vs intersection merging, dual-encoder vs cross-encoder reranking) on recall/precision is not explored.
  • Mathematical/formal completeness: the cosine similarity and RRF formulations are not rigorously parameterized or justified in the OM context; comparative evaluation against other fusion and similarity measures is missing.
  • Integration with symbolic reasoning: combining LLM-driven retrieval with OWL reasoners, rule systems, or graph algorithms for candidate generation/validation remains an open design and evaluation question.

Glossary

  • AgreementMakerLight (AML): A knowledge-based ontology matching system that automates aligning entities across ontologies. "AgreementMakerLight (AML)"
  • Agent-OM: The paper’s proposed agent-powered, LLM-based framework for ontology matching, using retrieval and matching agents with shared memory and tools. "Agent-OM"
  • BERT: A transformer-based LLM often used to generate text embeddings for downstream tasks. "BERT"
  • BERTMap: An ontology matching system that leverages BERT to improve matching performance. "BERTMap"
  • chain of thought (CoT): A prompting technique that guides LLMs to plan or reason via step-by-step decomposition. "chain of thought (CoT)"
  • cosine similarity: A metric that measures the similarity between two vectors based on the cosine of the angle between them. "cosine similarity"
  • CRUD: The basic database operations—create, read, update, and delete—used for managing stored data. "CRUD (create, read, update, and delete)"
  • DeepOnto: A toolkit supporting ontology-related tasks, including verbalisation of logical axioms. "DeepOnto"
  • embedding model: A model that maps text into vector representations to enable similarity computations. "embedding model"
  • fine-tuning: Adapting a pre-trained model to a specific task or domain by further training on labeled examples. "fine-tuning"
  • few-shot OM tasks: Ontology matching scenarios with very few labeled examples for learning. "few-shot OM tasks"
  • hybrid database: A data storage design that combines a relational database with a vector database for metadata and embeddings. "hybrid database"
  • in-context learning (ICL): Supplying examples or context in the prompt so an LLM can perform a task without parameter updates. "in-context learning (ICL)"
  • knowledge bases (KBs): Structured repositories of facts used to provide general or domain-specific information. "knowledge bases (KBs)"
  • knowledge graph (KG): A graph-structured representation of entities and their relationships, often used for reasoning and retrieval. "knowledge graph (KG)"
  • LangChain: A library for building LLM-driven agents with planning, memory, and tool use. "LangChain"
  • LLMs: Large neural LLMs pre-trained on massive corpora, used for generation and reasoning. "LLMs"
  • LLM agents: Systems that use an LLM as a controller to plan, use tools, and manage memory for complex tasks. "LLM agents"
  • LogMap: A traditional knowledge-based ontology matching system known for precision and effectiveness. "LogMap"
  • LogMap-ML: A machine learning-augmented variant of LogMap that integrates predictive techniques. "LogMap-ML"
  • Matching EvaLuation Toolkit (MELT): A toolkit for evaluating and benchmarking ontology matching methods. "Matching EvaLuation Toolkit (MELT)"
  • Model as a Service: The paradigm of invoking a model as an external service rather than integrating or retraining it. "Model as a Service"
  • Ontology Alignment Evaluation Initiative (OAEI): A community benchmark and set of tracks for evaluating ontology matching systems. "Ontology Alignment Evaluation Initiative (OAEI)"
  • ontology engineering: The process of designing, building, and maintaining ontologies. "ontology engineering"
  • Ontology matching (OM): The task of finding correspondences between entities in different ontologies to achieve semantic interoperability. "Ontology matching (OM)"
  • OWL Verbaliser: A tool that converts OWL axioms into natural language descriptions. "OWL Verbaliser"
  • pgvector: A PostgreSQL extension that enables storage and similarity search over vector embeddings. "pgvector"
  • Prefix:Name: An ontology entity naming convention using a namespace prefix and a local name. "Prefix:Name"
  • reciprocal rank fusion (RRF): A method to combine multiple ranked lists by summing reciprocal ranks to produce a fused ranking. "reciprocal rank fusion (RRF)"
  • retrieval-augmented generation (RAG): A technique that augments LLM prompts with retrieved documents to ground responses. "retrieval-augmented generation (RAG)"
  • rdfs:comment: An RDF Schema property used to attach human-readable descriptions to resources. "rdfs:comment"
  • rdfs:label: An RDF Schema property used to attach a human-readable name to a resource. "rdfs:label"
  • Sentence-BERT: A model that produces sentence-level embeddings suitable for semantic similarity tasks. "Sentence-BERT"
  • Self-RAG: A method where an LLM self-evaluates and refines retrieval-augmented outputs to reduce hallucinations. "Self-RAG"
  • SelfCheckGPT: An approach where an LLM checks its own outputs for consistency to detect hallucinations. "SelfCheckGPT"
  • Siamese agents: A pair of coordinated agents that operate separately (e.g., retrieval and matching) but share memory. "Siamese agents"
  • similarity search: Retrieving items most similar to a query in embedding space, typically via vector distance measures. "similarity search"
  • Sydney OWL Syntax: A controlled natural language/syntax for representing OWL axioms in a readable form. "Sydney OWL Syntax"
  • URI#Name: An ontology entity naming pattern using a full URI with a fragment identifier. "URI#Name"
  • vector database: A database that stores and indexes vector embeddings to support efficient similarity queries. "vector database"
  • VersaMatch: A machine learning-based ontology matching system. "VersaMatch"
  • Wikidata: A collaboratively edited knowledge base often used as an external source for general meanings. "Wikidata"

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.