Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sequence Datalog: Querying Sequence Data

Updated 8 June 2026
  • Sequence Datalog is an extension of classical Datalog designed for querying sequence databases using strings, paths, or sequences instead of atomic tuples.
  • It incorporates advanced features such as negation, recursion, intermediate predicates, and path concatenation to express complex queries.
  • A systematic study of its six core features distinguishes between primitive and redundant constructs, guiding efficient design of sequence query engines.

Sequence Datalog is an extension of the classical Datalog language designed for querying sequence databases where the atomic units of information are strings, paths, or sequences, rather than tuples of atomic values. By enriching Datalog with path expressions built through concatenation and supporting features such as negation, recursion, intermediate predicates, higher arity relations, equations, and packing, Sequence Datalog provides a uniform logic-programming framework capable of expressing complex queries over sequences. The expressiveness of this language has been precisely characterized through a systematic study of its six core features, yielding a refined hierarchy of language fragments and clarifying their interplay and redundancy (Aamer et al., 2022).

1. Formal Syntax and Semantic Foundations

Let Σ\Sigma denote a countable set of atomic symbols. The definitions are as follows:

  • Values and Paths: Each a∈Σa \in \Sigma is a value; if vv is a value, then ⟨v⟩\langle v \rangle is a packed value. A path is any finite concatenation v1ā‹…v2⋅…⋅vnv_1 \cdot v_2 \cdot \ldots \cdot v_n of values, including the empty path ε\varepsilon.
  • Variables: Atomic variables (notated @x) range over atomic symbols; path variables (x)rangeoverfinitesequences.</li><li><strong>PathExpressions:</strong>Builtinductivelyasx) range over finite sequences.</li> <li><strong>Path Expressions:</strong> Built inductively as e ::= a \mid \text{@x} \mid \$x \mid \langle e \rangle \mid e_1 \cdot e_2 ,withassociativeconcatenationandoptionalpacking.</li><li><strong>PredicatesandEquations:</strong>If, with associative concatenation and optional packing.</li> <li><strong>Predicates and Equations:</strong> If Risan is an nāˆ’aryrelationand-ary relation and e_1,\ldots,e_narepathexpressions,then are path expressions, then a \in \Sigma$0 is a predicate atom; $a \in \Sigma$1 denotes an equation.
  • Literals and Rules: Literals are (possibly negated) atomic predicates or equations. Rules take the form $a \in \Sigma$2 where $a \in \Sigma$3 is a set of literals and $a \in \Sigma$4 is a predicate. A program is a finite sequence of stratified rule sets (ā€œstrataā€), allowing stratified negation.
  • Semantics: An instance assigns to each $a \in \Sigma$5 a finite relation over paths. Valuations $a \in \Sigma$6 map variables to values or paths. Satisfaction $a \in \Sigma$7 holds if $a \in \Sigma$8 for predicates or $a \in \Sigma$9 for equations.

2. Orthogonal Language Features

Six language features are identified as orthogonal axes along which the expressivity and structural complexity of Sequence Datalog fragments can be analyzed:

Feature Notation Description
Negation N Use of stratified negation in rule bodies
Recursion R Recursive (cyclic) rule dependencies
Intermediate Predicates I More than one IDB predicate defined
Arity A Use of predicates with arity $v$0
Equations E Equality/inequality of path expressions
Packing P Use of the packing operator $v$1

Negation and recursion correspond to classical Datalog extensions. Intermediate predicates describe programs with non-flat structure. Arity allows for non-unary relations. Equations permit direct matching or constraints on sequences. Packing enables subsequences to be treated as atomic units.

3. Expressiveness Hierarchy: Redundancy and Primitivity Results

The rigorous analysis in (Aamer et al., 2022) demonstrates which features are strictly required (ā€œprimitiveā€) and which are always or sometimes redundant in the presence of others:

  • Arity (A) is always redundant: Any use of predicates of arity $v$2 can be simulated via unary predicates, packing, and a fresh separator symbol, with supporting equations to handle parsing of packed values.
  • Packing (P) is always redundant: Packed values can always be simulated via concatenation with delimiters and, in recursive contexts, output-undoubling constructions from J-Logic, relying on arity (already known redundant).
  • Equations (E) are redundant given both Negation (N) and Intermediate predicates (I): All uses of equality/inequality can be encoded with auxiliary predicates and stratified negation.
  • Intermediate predicates (I) are redundant absent both N and R: In positive, non-recursive, flat programs, all predicates can be inlined into the heads of rules via equations or packing.
  • Negation (N) is primitive: It fundamentally enables non-monotone queries, such as set difference, which are not expressible by positive programs alone.
  • Recursion (R) is primitive: Only recursive programs can express queries generating outputs of super-linear length with respect to input, such as computing $v$3 from $v4.</li><li><strong>Equations(E)primitivewithoutI:</strong>Patternāˆ’matchingqueriessuchascheckingā€œallāˆ’a’sā€requireequationsorunboundedrecursionintheabsenceofintermediatepredicates.</li><li><strong>Intermediatepredicates(I)primitivewithNorR:</strong>QuantifieralternationandgrowthphenomenashowthataddingItoNorRstrictlyincreasesexpressivepower.</li></ul><p>Therelationshipsamongfragmentsinduceapartialorder,with64syntacticsubsetsoffeaturescollapsingto11expressivenessequivalenceclasses.</p><h2class=′paperāˆ’heading′id=′structuralāˆ’fragmentsāˆ’andāˆ’latticeāˆ’ofāˆ’expressiveness′>4.StructuralFragmentsandLatticeofExpressiveness</h2><p>Fragmentsaredefinedasprogramsusingonlyasubset4.</li> <li><strong>Equations (E) primitive without I:</strong> Pattern-matching queries such as checking ā€œall-a’sā€ require equations or unbounded recursion in the absence of intermediate predicates.</li> <li><strong>Intermediate predicates (I) primitive with N or R:</strong> Quantifier alternation and growth phenomena show that adding I to {N} or {R} strictly increases expressive power.</li> </ul> <p>The relationships among fragments induce a partial order, with 64 syntactic subsets of features collapsing to 11 expressiveness equivalence classes.</p> <h2 class='paper-heading' id='structural-fragments-and-lattice-of-expressiveness'>4. Structural Fragments and Lattice of Expressiveness</h2> <p>Fragments are defined as programs using only a subset v$5. On flat unary instances with monadic schemas, two fragments may be equivalent in expressive power ($v$6), or one may strictly dominate the other. The expressiveness lattice, considering redundancy results, is as follows (arrows denote strict containment):

    • $v$7: purely positive, nonrecursive, monadic
    • $v$8
    • $v$9
    • $\langle v \rangle$0, $\langle v \rangle$1
    • $\langle v \rangle$2
    • $\langle v \rangle$3 (the most expressive)

    Packing (P) and arity (A) do not increase expressive power, and fragments distinguished only by these features collapse together. The top of the lattice is the fragment permitting all features; otherwise, the key ā€œleversā€ are negation and recursion.

    5. Illustrative Patterns and Canonical Examples

    Several prototypical queries illustrate the use and necessity of Sequence Datalog features:

    • NFA Acceptance (Recursion):

    $\langle v \rangle$6

    • All-a’s Test (Equation):

    $\langle v \rangle$7

    • Subsequence Packing (Packing Operator):

    $\langle v \rangle$8

    • Reversal Without Arity: Encoding $\langle v \rangle$4 as a single bracketed value $\langle v \rangle$5 and using equations allows reversal constructions without genuinely binary predicates.

    These examples underpin the theoretical results by showing which features are exploited and how simulating them with weaker fragments fails.

    6. Design Implications and Practical Considerations

    The expressiveness characterization directly informs the design of sequence query engines:

    • Arity and packing can be omitted from implementations without loss of generality, as their apparent expressive contributions are always redundant.
    • Equations are essential for concise expression of pattern tests but offer no gain if both negation and intermediate predicates are already present.
    • Intermediate predicates can be excluded in positive, non-recursive settings (flat programs) but are indispensable with negation or recursion.
    • Recursion and stratified negation are the primary sources of increased expressive power, fundamentally enlarging the class of queries that can be defined.

    A plausible implication is that practical systems may maximize implementational tractability by focusing support on recursion and stratified negation while minimizing or syntactically eliminating features like arity and packing.

    7. Research Significance and Outlook

    The systematic analysis of Sequence Datalog features provides a comprehensive map of all potential language fragments, their expressive capabilities, and mutual simulations or strict separations. The results clarify longstanding questions about the necessity of various logic-programming extensions for sequence data and identify where expressive ā€œjumpsā€ actually occur. For sequence-centric applications—including process mining, information extraction, and modern graph/path query tasks—these insights enable meaningful language design choices and guide the implementation of efficient query engines attuned to the true requirements of their domains (Aamer et al., 2022).

    Definition Search Book Streamline Icon: https://streamlinehq.com
    References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequence Datalog.