Papers
Topics
Authors
Recent
Search
2000 character limit reached

Expand Bar: Efficient RDF Graph Exploration

Updated 7 February 2026
  • Expand Bar is a formal mechanism in eLinda that partitions RDF nodes based on semantic labels using subclass, property, or object expansions.
  • It employs rigorous, set-theoretic definitions and SPARQL-expressible algorithms to facilitate efficient visual exploration of large linked data sets.
  • Optimizations like incremental evaluation, caching, and specialized SQL indexes ensure sub-second response times even on datasets with hundreds of millions of triples.

The Expand Bar operation is a formal interactive mechanism in the eLinda system for visual exploration of linked data, specifically large RDF graphs. At each step, the user selects a "bar" representing a set of nodes and a semantic label, and the system expands this bar to a new bar chart along one of several supported axes (e.g., subclass, property, or object type). This operation is rigorously defined, algorithmically characterized, and engineered for low-latency usability even on very large datasets (Mishali et al., 2017).

1. Formal Structure and Definition

Each bar in eLinda is defined as a triple B=S,λ,tB = \langle S, \lambda, t \rangle, with SU(G)S \subseteq U(G) (a set of subject URIs from RDF graph GG), λU(G)\lambda\in U(G) (the bar's label), and t{class,property}t \in \{\text{class}, \text{property}\} (indicating semantic type). The Expand Bar functor η\eta operates on BB to produce a new bar chart, i.e., a partition of SS by new labels and types. Supported expansion kinds are:

  • Subclass expansion (only when t=classt = \text{class}): Computes the distribution over direct subclasses of λ\lambda present among SS.
  • Property expansion (only when t=classt = \text{class}): Partitions SS by outgoing RDF properties used.
  • Object expansion (only when t=propertyt = \text{property}): Partitions according to the rdf:typerdf:type of objects connected by λ\lambda from SS.

Each expansion has precise set-theoretic and SPARQL-expressible semantics, accompanied by explicit histogram formulas.

2. Algorithmic Expansions and LaTeX Formulations

The three core expansion algorithms are specified as follows:

2.1 Subclass Expansion (ηsub\eta_\mathrm{sub}):

Given B=S,λ,classB = \langle S, \lambda, \text{class} \rangle:

  • Compute

labels(C)={τU(G):(τ,rdfs:subClassOf,λ)G}\text{labels}(C) = \{\tau \in U(G) : (\tau, \text{rdfs:subClassOf}, \lambda) \in G\}

  • For each τ\tau,

Sτ={sS:(s,rdf:type,τ)G}S_\tau = \{ s \in S : (s, \text{rdf:type}, \tau) \in G \}

  • Output histogram:

Hsub(S;λ)={(τ,{sS(s,rdf:type,τ)G  (τ,rdfs:subClassOf,λ)G})}H_{\mathrm{sub}}(S;\lambda) = \left\{ \left(\tau, \left| \left\{ s \in S \mid (s,\mathrm{rdf:type},\tau)\in G \ \wedge\ (\tau,\mathrm{rdfs:subClassOf},\lambda)\in G \right\}\right| \right) \right\}

2.2 Property Expansion (ηprop\eta_\mathrm{prop}):

Given B=S,λ,classB = \langle S, \lambda, \text{class} \rangle:

  • Compute

labels(P)={pU(G):sS,o ((s,p,o)G)}\text{labels}(P) = \{ p \in U(G) : \exists s\in S, o\ ((s,p,o)\in G) \}

  • For each pp,

Sp={sS:o ((s,p,o)G)}S_p = \{ s\in S : \exists o\ ((s,p,o)\in G) \}

  • Output histogram:

Hprop(S)={(p,{sS:o(s,p,o)G})}H_{\mathrm{prop}}(S) = \left\{ \left(p, |\{ s\in S : \exists o\, (s,p,o)\in G\}| \right) \right\}

2.3 Object Expansion (ηobj\eta_\mathrm{obj}):

Given B=S,p,propertyB = \langle S, p, \text{property} \rangle:

  • Compute

labels(O)={τU(G):sS,o ((s,p,o)G(o,rdf:type,τ)G)}\text{labels}(O) = \{ \tau \in U(G) : \exists s\in S,o\ ((s,p,o)\in G \wedge (o,\text{rdf:type},\tau)\in G ) \}

  • For each τ\tau,

Sτ={oU(G):sS,(s,p,o)G,(o,rdf:type,τ)G}S_\tau = \{ o \in U(G) : \exists s\in S, (s,p,o)\in G, (o, \text{rdf:type}, \tau)\in G \}

  • Output histogram:

Hobj(S;p)={(τ,{o:sS,(s,p,o)G(o,rdf:type,τ)G})}H_{\mathrm{obj}}(S;p) = \left\{ \left( \tau, |\{ o : \exists s\in S, (s,p,o)\in G \wedge (o, \mathrm{rdf:type}, \tau)\in G \}| \right) \right\}

3. Indexing, Caching, and Performance

For scalability, eLinda implements a three-pronged strategy to guarantee sub-second interactive latency:

  • Incremental Evaluation: For operations that may require full graph scans (e.g., initial expansion), SPARQL GROUP BY queries are paginated with LIMIT/OFFSET. Partial aggregates are merged by the frontend, enabling immediate UI feedback.
  • Heavy-Query Store (HVS): Any expansion query exceeding a latency threshold (e.g., 1s) is stored in a local key–value cache keyed by query hash, enabling O(1)O(1) lookup for subsequent identical expansions. The cache is invalidated on mirror graph updates.
  • Decomposer with Specialized Indexes: Frequently-used charts are supported by SQL summary tables (triple_sp, triple_po) with B-tree indexes. This eliminates global joins and provides O(distinct labelslogG+O(\mathrm{distinct\ labels} \cdot \log|G| +chart size)) complexity for expansions, yielding near-interactive performance on graphs with G|G| in the hundreds of millions.

A performance summary table:

Technique Complexity Typical Latency
SPARQL GROUP BY O(G)O(|G|) Minutes
Incremental (N rows) O(N)O(N) Sub-second (per page)
HVS cache O(1)O(1) \sim50 ms
Decomposer+indexes O(DlogG+R)O(D \log|G| + R) 1–2 seconds

with DD = distinct labels, RR = chart size (Mishali et al., 2017).

4. Worked Example

Consider the RDF graph GG given by 5 triples:

  1. <<John>> rdf:type Person ; birthPlace <<Vienna>> ; influencedBy <<Plato>>
  2. <<Jane>> rdf:type Person ; birthPlace <<Berlin>> ; influencedBy <<Socrates>>
  3. <<Beethoven>> rdf:type Person ; birthPlace <<Bonn>> ; influencedBy <<Mozart>>
  4. <<IBM>> rdf:type Company.
  5. <<MonaLisa>> rdf:type Artwork ; creator <<DaVinci>>
  • Step 1 – Subclass Expansion: Expanding S0,owl:Thing,class\langle S_0, \mathrm{owl{:}Thing}, \text{class} \rangle with S0S_0 all subjects yields the chart {(Person,3),(Company,1),(Artwork,1)}\{ (\text{Person}, 3), (\text{Company}, 1), (\text{Artwork}, 1) \}.
  • Step 2 – Select Person bar: Yields S1={S_1 = \{John, Jane, Beethoven}\}.
  • Step 3 – Property Expansion: For S1S_1, both "birthPlace" and "influencedBy" yield counts of 3 each.
  • Step 4 – Select influencedBy bar: Yields objects {\{Plato, Socrates, Mozart}\}.
  • Step 5 – Object Expansion: If (Plato,rdf:type,Philosopher)(\text{Plato},\text{rdf:type},\text{Philosopher}), (Socrates,rdf:type,Philosopher)(\text{Socrates},\text{rdf:type},\text{Philosopher}), (Mozart,rdf:type,Composer)(\text{Mozart},\text{rdf:type},\text{Composer}) are in GG, the histogram is {\{(Philosopher,2), (Composer,1)}\}.

This example illustrates the exact semantics, data flow, and resulting partitions/labels for each expansion type (Mishali et al., 2017).

5. Implementation Optimizations and Remote Access

  • Front-end merging of paged counts permits responsive visualization as aggregates load.
  • Key–Value queries for heavy expansions accelerate repeated analytics over static datasets (useful for user sessions with repeated navigation patterns).
  • SQL summary tables (TripleSP, TriplePO) dramatically reduce join sizes, leveraging index locality for scalability.
  • SPARQL compatibility mode is retained for deployment on third-party endpoints, with expected higher response times due to lack of index optimizations.

When running against remote triple stores, only incremental and paged strategies are possible, but the system still delivers fast initial overviews suitable for large-scale knowledge-graph exploration (Mishali et al., 2017).

6. Significance and Application Scenarios

The Expand Bar paradigm is foundational for interactive semantic exploration of RDF graphs. It supports:

  • Schema inference: Identifying class, property, and object type distributions visually across arbitrary URI sets.
  • Semantic faceted browsing: Chaining expansions to drill down by ontology, relation type, or attribute.
  • Data quality and curation: Rapid detection of coverage, missing types, or anomalous property usage.
  • Knowledge discovery in large graphs: Scalably navigating tens or hundreds of millions of triples with sub-second feedback.

The explicit formalization of expansion operations, their efficient implementation, and the decoupling of navigation from SPARQL-specific limitations distinguish eLinda's Expand Bar approach among linked-data explorers (Mishali et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Expand Bar.