Papers
Topics
Authors
Recent
Search
2000 character limit reached

Linked Open Terms (LOT) Methodology

Updated 17 March 2026
  • Linked Open Terms (LOT) is a systematic approach for extracting, modeling, publishing, and interlinking domain-specific terms to enhance discovery and machine-readability.
  • It employs entity linking, exclusion filtering, and graph traversal techniques to construct hierarchical ontologies from large-scale linked data resources.
  • The methodology has been applied in domains like polymer materials and mathematical knowledge, optimizing recall-precision trade-offs and enabling robust data integration.

The Linked Open Terms (LOT) methodology encompasses systematic practices for extracting, modeling, publishing, and interlinking domain concepts and formal terms within large-scale, distributed linked data environments. Across both general Linked Open Data (LOD) and specialized mathematical knowledge bases (e.g., OpenMath Content Dictionaries), the LOT paradigm advances the integration and machine-readability of field-specific terminologies, ensuring that domain ontologies and controlled vocabularies are discoverable, interoperable, and computationally actionable in the broader Web of Data (Kume et al., 2021, Lange, 2010).

1. Core Definitions and Concept Structures

At the heart of the LOT methodology is the connection between technical vocabularies and structured data representations. A search entity is an LOD resource whose label (rdfs:label or skos:altLabel) matches exactly a domain term after lexical normalization and domain-specific exclusion filtering. For a set of entities EE and vocabulary VV, the set of search entities is

SE={sElabel(s)VpassesExclusion(s)}SE = \{s \in E \mid \text{label}(s) \in V \wedge \text{passesExclusion}(s)\}

An upper-level concept for ss is any node reachable by traversing one or more “is-a” relations (specifically, subClassOf/P279 or instanceOf/P31 for the first hop, then subClassOf/P279 only). The ancestry set Anc(s)\mathit{Anc}(s) forms the base for identifying conceptual hierarchies.

A common upper-level entity (CU) is present as an ancestor for at least two distinct search entities within the integrated ancestor graph G=(Vg,Eg)G=(V_g,E_g); formally, a node uu with in-degree 2\geq 2. Chain-of-path relationships describe directed sequences of CUs representing shared super-conceptual paths.

The expanded CU (ECU) serves as a root for downward exploration, with the number of expansion steps (NES) for ECU xx defined as the maximal shortest path from xx to its subordinate search entities:

NESx=max{d(x,s)s under x}NES_x = \max\{d(x,s) \mid s \text{ under } x\}

2. Stepwise Extraction Workflow for Domain-specific Concepts

The canonical LOT extraction process for constructing a field-specific ontology from LOD comprises six sequential steps (Kume et al., 2021):

  1. Domain Vocabulary Construction: Compile VV from curated documents, technical dictionaries, or NLP-driven extractions; apply compound detection and part-of-speech filtering as required.
  2. Entity Linking and Exclusion Filtering: Map each term vVv \in V to LOD entities via exact label/alias matching, with explicit exclusion rules (e.g., filtering near blacklisted Q-IDs or disqualifying by certain properties, such as administrative division/gender).
  3. Upper-level Concept Retrieval and Integrated Graph Construction: For each sSEs \in SE, perform upward breadth-first traversal along “is-a” edges, constructing GG. Support for each node uu is computed as {sSEuAnc(s)}|\{s \in SE \mid u \in Anc(s)\}|; the CU set is identified accordingly.
  4. Extraction of Common Paths, Partitioning, and NES Calculation:
    • Induce the CU subgraph GCUG_{CU}.
    • Identify all directed CU-CU chains (common paths).
    • Remove these from GG to produce partitioned components {Gi}\{G_i\}.
    • For each component, select ECUs (uu with support 2\geq 2) and compute NESuNES_u as above.
  5. Downward Expansion for Lower-level Concept Retrieval: For each ECU, retrieve all reachable entities via downward property paths (varying by NES), collecting all candidate concepts.
  6. Coverage and Precision Evaluation: Compare the candidate terms against a trusted dictionary index DD. The matched set is M={label(y)yCandidateConcepts}DM = \{\text{label}(y) \mid y \in \text{CandidateConcepts}\} \cap D. Coverage metrics:
    • Recall(N)=M/DRecall(N) = |M| / |D|
    • Precision(N)=M/CandidateConceptsPrecision(N) = |M| / |\text{CandidateConcepts}|
    • F1(N)=2RecallPrecisionRecall+PrecisionF_1(N)=2\cdot\frac{Recall \cdot Precision}{Recall+Precision} (optional)

Varying the NES cutoff NN traces the recall-precision curve, allowing for optimization of extraction depth and noise trade-off.

3. LOT Implementation in Mathematical Knowledge (OpenMath CDs)

In mathematical domains, the LOT methodology addresses the limitations of conventional OpenMath Content Dictionaries (CDs) for Web integration (Lange, 2010). Key prescriptions are:

  • Stable, dereferenceable HTTP URIs for every CD and every symbol, enabling independent ownership and versioning.
  • Content negotiation to supply representations in HTML, XML (with “application/openmath+xml”), or RDF.
  • RDF vocabularies for symbol, CD, containment, definitional mapping (e.g., om:inCD, om:hasDefinition).
  • Role-disambiguated formal mathematical properties (FMPs), with explicit marking for definitional (<<FMP role="definition">>) and computational FMPs.
  • Provenance integration using FOAF and Dublin Core, and inter-CD and external linking (e.g., DBpedia, DLMF) via RDFa, seeAlso, and skos:exactMatch.

This systematic approach converts mathematical symbol repositories into first-class, machine-readable segments of the overall Web of Data, enabling clients to retrieve and reason over definitional content, invoke computation, or integrate mathematical semantics into external datasets.

4. Empirical Application: Polymer Materials Ontology Extraction

The LOT methodology was empirically demonstrated on Wikidata for the domain of polymer materials:

  • Vocabulary: 510 Japanese domain terms from PoLyInfo.
  • Entity linkage: 199 Wikidata search entities identified.
  • Graph expansion: Upwards BFS yielded a 763-node, 1,292-edge ancestor graph; 346 CUs (e.g., "chemical compound"/wd:Q11173).
  • Partitioning: 172 ECU roots, NES values from 1–7 (e.g., "polymer" NES=1, "chemical process" NES=2).
  • Downward expansion: Initial retrieval produced ~68M unique concepts; after subtree trimming, ~16M remained.
  • Coverage metrics: Recall quickly rises to ~0.65 at NES=5, then plateaus; precision declines as NES increases. NES=5 cut-off yields recall ≈0.67, precision ≈0.03, using a ground-truth dictionary of ~6,700 Japanese terms and 2,054 mapped Wikidata entities (Kume et al., 2021).

5. Generalization and Adaptation to Arbitrary Domains

The LOT methodology is domain-agnostic:

  • Vocabulary sources: Any technical lexicon, dictionary, or NLP/NER output.
  • Entity linking: Adaptable via normalization, embeddings, and alias mapping.
  • Graph traversal: Applies to any RDF-based resource with hierarchical class links; property IDs are remapped as appropriate (e.g., to DBpedia, ChEBI, MeSH RDF).
  • Exclusion/pruning: Tailored via blacklists and property-based rules; semantic ranking and domain frequency weighting are possible enhancements.
  • NES tuning: Ground-truth subset or expert review calibrate optimal expansion depth.

A plausible implication is that LOT serves as a generic, scalable pipeline for ontology bootstrapping from open-data knowledge graphs, with minor adaptation required for dataset particulars and language specifics.

6. Alignment with Linked Data Best Practices and SPARQL Integration

The LOT approach rigorously instantiates Linked Data principles:

  • Dereferenceable URIs and content negotiation: All resources are accessible via HTTP in multiple negotiated formats, supporting both human and machine clients.
  • Standard metadata integration: Utilization of DC, FOAF, SKOS, and OWL for provenance, semantics, and interlinking.
  • SPARQL workflows: Clients may federate queries across statistical and mathematical endpoints, retrieve term definitions, and programmatically traverse or expand definitions (e.g., via om:hasDefinition fields with embedded OpenMath XML), providing dynamic symbol resolution and on-demand computation (Lange, 2010).

7. Conceptual Advances and Limitations

The LOT methodology resolves long-standing issues in domain ontology extraction and mathematical knowledge representation:

  • Resolution of OpenMath 2’s limitations: Overcomes static CDBase URI reuse, non-dereferenceable CDs, ambiguous FMP semantics, and lack of external linking.
  • Modularity and distribution: Establishes a true Web of distributed, machine-understandable scientific vocabulary.
  • Interoperability: Facilitates integration between disparate datasets, e.g., allowing datasets to reference OpenMath terms for provenance or computational traceability.

A suggested direction for further evolution involves automating disambiguation and synonym resolution using contextual embeddings and NLP pipelines, as well as extending provenance and semantic annotations for broader scientific reproducibility and knowledge discovery.


Key references:

“Extracting Domain-specific Concepts from Large-scale Linked Open Data” (Kume et al., 2021), “Towards OpenMath Content Dictionaries as Linked Data” (Lange, 2010).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Linked Open Terms (LOT) Methodology.