Sequence Datalog: Querying Sequence Data

Updated 8 June 2026

Sequence Datalog is an extension of classical Datalog designed for querying sequence databases using strings, paths, or sequences instead of atomic tuples.
It incorporates advanced features such as negation, recursion, intermediate predicates, and path concatenation to express complex queries.
A systematic study of its six core features distinguishes between primitive and redundant constructs, guiding efficient design of sequence query engines.

Sequence Datalog is an extension of the classical Datalog language designed for querying sequence databases where the atomic units of information are strings, paths, or sequences, rather than tuples of atomic values. By enriching Datalog with path expressions built through concatenation and supporting features such as negation, recursion, intermediate predicates, higher arity relations, equations, and packing, Sequence Datalog provides a uniform logic-programming framework capable of expressing complex queries over sequences. The expressiveness of this language has been precisely characterized through a systematic study of its six core features, yielding a refined hierarchy of language fragments and clarifying their interplay and redundancy (Aamer et al., 2022).

1. Formal Syntax and Semantic Foundations

Let $\Sigma$ denote a countable set of atomic symbols. The definitions are as follows:

Values and Paths: Each $a \in \Sigma$ is a value; if $v$ is a value, then $\langle v \rangle$ is a packed value. A path is any finite concatenation $v_1 \cdot v_2 \cdot \ldots \cdot v_n$ of values, including the empty path $\varepsilon$ .
Variables: Atomic variables (notated @x) range over atomic symbols; path variables ( $x) range over finite sequences.</li> <li>Path Expressions: Built inductively as$ e ::= a \mid \text{@x} \mid \$x \mid \langle e \rangle \mid e_1 \cdot e_2 $, with associative concatenation and optional packing.</li> <li>Predicates and Equations: If$ R $is an$ n $-ary relation and$ e_1,\ldots,e_n $are path expressions, then$ a \in \Sigma$0 is a predicate atom; $a \in \Sigma$1 denotes an equation.
Literals and Rules: Literals are (possibly negated) atomic predicates or equations. Rules take the form $a \in \Sigma$2 where $a \in \Sigma$3 is a set of literals and $a \in \Sigma$4 is a predicate. A program is a finite sequence of stratified rule sets (“strata”), allowing stratified negation.
Semantics: An instance assigns to each $a \in \Sigma$5 a finite relation over paths. Valuations $a \in \Sigma$6 map variables to values or paths. Satisfaction $a \in \Sigma$7 holds if $a \in \Sigma$8 for predicates or $a \in \Sigma$9 for equations.

2. Orthogonal Language Features

Six language features are identified as orthogonal axes along which the expressivity and structural complexity of Sequence Datalog fragments can be analyzed:

Feature	Notation	Description
Negation	N	Use of stratified negation in rule bodies
Recursion	R	Recursive (cyclic) rule dependencies
Intermediate Predicates	I	More than one IDB predicate defined
Arity	A	Use of predicates with arity $v$0
Equations	E	Equality/inequality of path expressions
Packing	P	Use of the packing operator $v$1

Negation and recursion correspond to classical Datalog extensions. Intermediate predicates describe programs with non-flat structure. Arity allows for non-unary relations. Equations permit direct matching or constraints on sequences. Packing enables subsequences to be treated as atomic units.

3. Expressiveness Hierarchy: Redundancy and Primitivity Results

The rigorous analysis in (Aamer et al., 2022) demonstrates which features are strictly required (“primitive”) and which are always or sometimes redundant in the presence of others:

Arity (A) is always redundant: Any use of predicates of arity $v$2 can be simulated via unary predicates, packing, and a fresh separator symbol, with supporting equations to handle parsing of packed values.
Packing (P) is always redundant: Packed values can always be simulated via concatenation with delimiters and, in recursive contexts, output-undoubling constructions from J-Logic, relying on arity (already known redundant).
Equations (E) are redundant given both Negation (N) and Intermediate predicates (I): All uses of equality/inequality can be encoded with auxiliary predicates and stratified negation.
Intermediate predicates (I) are redundant absent both N and R: In positive, non-recursive, flat programs, all predicates can be inlined into the heads of rules via equations or packing.
Negation (N) is primitive: It fundamentally enables non-monotone queries, such as set difference, which are not expressible by positive programs alone.
Recursion (R) is primitive: Only recursive programs can express queries generating outputs of super-linear length with respect to input, such as computing $v$3 from $v $4.</li> <li>Equations (E) primitive without I: Pattern-matching queries such as checking “all-a’s” require equations or unbounded recursion in the absence of intermediate predicates.</li> <li>Intermediate predicates (I) primitive with N or R: Quantifier alternation and growth phenomena show that adding I to {N} or {R} strictly increases expressive power.</li> </ul> The relationships among fragments induce a partial order, with 64 syntactic subsets of features collapsing to 11 expressiveness equivalence classes. <h2 class='paper-heading' id='structural-fragments-and-lattice-of-expressiveness'>4. Structural Fragments and Lattice of Expressiveness</h2> Fragments are defined as programs using only a subset$ v$5. On flat unary instances with monadic schemas, two fragments may be equivalent in expressive power ($v$6), or one may strictly dominate the other. The expressiveness lattice, considering redundancy results, is as follows (arrows denote strict containment):
- $v$7: purely positive, nonrecursive, monadic
- $v$8
- $v$9
- $\langle v \rangle$0, $\langle v \rangle$1
- $\langle v \rangle$2
- $\langle v \rangle$3 (the most expressive)
Packing (P) and arity (A) do not increase expressive power, and fragments distinguished only by these features collapse together. The top of the lattice is the fragment permitting all features; otherwise, the key “levers” are negation and recursion.

5. Illustrative Patterns and Canonical Examples

Several prototypical queries illustrate the use and necessity of Sequence Datalog features:
- NFA Acceptance (Recursion):
$\langle v \rangle$6
- All-a’s Test (Equation):
$\langle v \rangle$7
- Subsequence Packing (Packing Operator):
$\langle v \rangle$8
- Reversal Without Arity: Encoding $\langle v \rangle$4 as a single bracketed value $\langle v \rangle$5 and using equations allows reversal constructions without genuinely binary predicates.
These examples underpin the theoretical results by showing which features are exploited and how simulating them with weaker fragments fails.

6. Design Implications and Practical Considerations

The expressiveness characterization directly informs the design of sequence query engines:
- Arity and packing can be omitted from implementations without loss of generality, as their apparent expressive contributions are always redundant.
- Equations are essential for concise expression of pattern tests but offer no gain if both negation and intermediate predicates are already present.
- Intermediate predicates can be excluded in positive, non-recursive settings (flat programs) but are indispensable with negation or recursion.
- Recursion and stratified negation are the primary sources of increased expressive power, fundamentally enlarging the class of queries that can be defined.
A plausible implication is that practical systems may maximize implementational tractability by focusing support on recursion and stratified negation while minimizing or syntactically eliminating features like arity and packing.

7. Research Significance and Outlook

The systematic analysis of Sequence Datalog features provides a comprehensive map of all potential language fragments, their expressive capabilities, and mutual simulations or strict separations. The results clarify longstanding questions about the necessity of various logic-programming extensions for sequence data and identify where expressive “jumps” actually occur. For sequence-centric applications—including process mining, information extraction, and modern graph/path query tasks—these insights enable meaningful language design choices and guide the implementation of efficient query engines attuned to the true requirements of their domains (Aamer et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

1.

Expressiveness within Sequence Datalog (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequence Datalog.

Sequence Datalog: Querying Sequence Data

1. Formal Syntax and Semantic Foundations

2. Orthogonal Language Features

3. Expressiveness Hierarchy: Redundancy and Primitivity Results

5. Illustrative Patterns and Canonical Examples

6. Design Implications and Practical Considerations

7. Research Significance and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sequence Datalog: Querying Sequence Data

1. Formal Syntax and Semantic Foundations

2. Orthogonal Language Features

3. Expressiveness Hierarchy: Redundancy and Primitivity Results

5. Illustrative Patterns and Canonical Examples

6. Design Implications and Practical Considerations

7. Research Significance and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research