Graph Extension Grammars (GEGs)

Updated 1 April 2026

Graph Extension Grammars are a formalism using graph algebras and regular tree grammars to specify and parse labeled, directed graphs with non-structural reentrancies.
They combine binary union and unary extension operations to enable arbitrary node-sharing, crucial for modeling phenomena in semantic representations such as AMR.
GEGs offer a tractable, polynomial-time parsing algorithm that employs memoization and profile matching to efficiently fuse graph components during derivation.

Graph Extension Grammars (GEGs) are a formalism introduced to address the expressive and computational requirements of generating and parsing languages of directed, node- and edge-labelled graphs, particularly supporting non-structural reentrancies as found in semantic representations such as Abstract Meaning Representation (AMR). A GEG consists of an algebra over graphs—incorporating specific graph operations—and a regular tree grammar that generates expressions over these operations. The resulting framework allows specification of sets of graphs with both rich structural constraints and tractable polynomial-time parsing (Björklund et al., 2021).

1. Formal Structure of Graph Extension Grammars

A GEG is fundamentally composed of two interlocked components: (i) a graph-extension algebra specifying operations over families of typed, labelled graphs, and (ii) a regular tree grammar generating terms over those operations. The graph domains are defined as follows:

Let $\Sigma_n$ be a finite set of node-labels and $\Sigma_e$ a finite set of edge-labels. For each nonnegative integer $\tau$ , $\mathbb{G}_\tau$ is the set of all finite, directed, node- & edge-labelled graphs $G = (V, E, \ell, port)$ with $|port| = \tau$ , where:

$V$ is a finite set of nodes,
$E \subseteq V \times \Sigma_e \times V$ ,
$\ell \colon V \to \Sigma_n$ ,
$port \in V^\tau$ is a sequence of $\Sigma_e$ 0 (possibly repeating) nodes called the ports.

The union $\Sigma_e$ 1 collects all such graphs, parameterized by type $\Sigma_e$ 2.

Operations in the Signature

The signature $\Sigma_e$ 3 comprises two key forms:

Binary union operations $\Sigma_e$ 4, which yield disjoint unions (up to renaming) and concatenate ports.
Unary extension operations $\Sigma_e$ 5 specified by tuples

$\Sigma_e$ 6

where $\Sigma_e$ 7 is a graph of type $\Sigma_e$ 8; $\Sigma_e$ 9 is a sequence of $\tau$ 0 dock-nodes; $\tau$ 1 is the set of "clonable" context nodes. These are subject to: - (R1) All edge sources in $\tau$ 2 are from $\tau$ 3, - (R2) Every non-port node in $\tau$ 4 is a target of some edge in $\tau$ 5.

Extension $\tau$ 6 is of type $\tau$ 7 where $\tau$ 8, $\tau$ 9. The operation $\mathbb{G}_\tau$ 0 acts nondeterministically by attaching new structure and enabling node-fusion as described below.

The algebra is completed by including the constant operator for the empty graph $\mathbb{G}_\tau$ 1 of type 0.

Algebraic Semantics

A graph-extension algebra is a many-sorted algebra

$\mathbb{G}_\tau$ 2

with $\mathbb{G}_\tau$ 3 and the above-specified operations, producing sets of graphs. Reachability is enforced: every node of every generated graph is reachable from some port-node.

2. Regular Tree Grammar Mechanism

A GEG employs a regular, many-sorted tree grammar $\mathbb{G}_\tau$ 4 with finitely many nonterminals $\mathbb{G}_\tau$ 5, each assigned a type $\mathbb{G}_\tau$ 6. Productions are of the form

$\mathbb{G}_\tau$ 7

where $\mathbb{G}_\tau$ 8 has type $\mathbb{G}_\tau$ 9. The generated tree language $G = (V, E, \ell, port)$ 0 encodes valid sequences of operations.

A full GEG is then

$G = (V, E, \ell, port)$ 1

with associated graph language

$G = (V, E, \ell, port)$ 2

obtained by recursively mapping derivation trees to their semantic value in the algebra.

Evaluation of $G = (V, E, \ell, port)$ 3 proceeds by induction:

Base: If $G = (V, E, \ell, port)$ 4, $G = (V, E, \ell, port)$ 5.
Union: If $G = (V, E, \ell, port)$ 6, then $G = (V, E, \ell, port)$ 7.
Extension: If $G = (V, E, \ell, port)$ 8 with $G = (V, E, \ell, port)$ 9 of type $|port| = \tau$ 0, then $|port| = \tau$ 1.

3. Non-Structural Reentrancy Modelling

GEGs natively support non-structural reentrancy, a key limitation of prior devices like hyperedge-replacement grammars where only context-free attachments are possible. Non-structural reentrancy refers to allowing nodes in a partially constructed graph to appear multiple times as targets of newly introduced edges, a necessity for capturing phenomena such as shared arguments or referents in AMR.

Each $|port| = \tau$ 5 can be cloned an arbitrary number of times (non-deterministically).
Each clone is injectively merged ("fused") to a distinct node of $|port| = \tau$ 6 with the same node-label.
This process enables one or several new outgoing edges from operations to point to the same node(s) in $|port| = \tau$ 7, thus creating arbitrary non-structural reentrancies.

The use of $|port| = \tau$ 8 thus controls and enables the expressive generativity over non-structural reentrancies—an essential feature missing from many prior graph grammar formalisms.

4. Parsing Algorithms and Complexity

The challenge of parsing a given graph $|port| = \tau$ 9 with respect to a GEG $V$ 0 is to decide whether $V$ 1. This is determined via the existence of a derivation tree $V$ 2 and an assignment of semantic values yielding exactly $V$ 3. GEGs provide a polynomial-time parsing algorithm, summarized as follows:

The principal recursive routine $V$ 4 tests whether the induced subgraph from port-sequence $V$ 5 in $V$ 6 is derivable by nonterminal $V$ 7.
Results are memoized in a table $V$ 8.
Union productions and extension rules are handled with explicit splitting or matching logic.
Matching for extension rules involves fusing dock-nodes to the graph at $V$ 9 and considering all ways context-clones may merge, while preserving labels and the "clone only clonable" constraint.

Data Structures and Efficiency

Key data structures include:

Profiles $E \subseteq V \times \Sigma_e \times V$ 0 for nodes in $E \subseteq V \times \Sigma_e \times V$ 1,
Likewise, $E \subseteq V \times \Sigma_e \times V$ 2 for nodes in the operation's underlying graph, to efficiently check compatibility during matching and merging.

Complexity Summary

With $E \subseteq V \times \Sigma_e \times V$ 3 and $E \subseteq V \times \Sigma_e \times V$ 4, the algorithm executes in time $E \subseteq V \times \Sigma_e \times V$ 5, primarily determined by the combinatorics of ports and clones. Under additional unambiguity constraints on profiles, complexity can be reduced to $E \subseteq V \times \Sigma_e \times V$ 6 or even linear time, as per corollaries in the primary reference (Björklund et al., 2021).

Parsing Step	Complexity	Key Limitation/Optimization
Memo table computation	$E \subseteq V \times \Sigma_e \times V$ 7
Union production handling	$E \subseteq V \times \Sigma_e \times V$ 9	Constant number per nonterminal
Extension handling (naive)	$\ell \colon V \to \Sigma_n$ 0	Filters via profile comparison
Overall (optimized)	$\ell \colon V \to \Sigma_n$ 1	Profile constraints

5. Representative Example: Reentrant Graph Generation

Consider a minimal GEG showcasing non-structural reentrancy:

Node labels: $\ell \colon V \to \Sigma_n$ 2,
Edge labels: $\ell \colon V \to \Sigma_n$ 3,
Extension op $\ell \colon V \to \Sigma_n$ $ℓ : V \to Σ_{n}$ 4 of type $\ell \colon V \to \Sigma_n$ $ℓ : V \to Σ_{n}$ 5:
- $\ell \colon V \to \Sigma_n$ 6,
- $\ell \colon V \to \Sigma_n$ 7, $\ell \colon V \to \Sigma_n$ 8, $\ell \colon V \to \Sigma_n$ 9,
- $port \in V^\tau$ 0,
- $port \in V^\tau$ 1.
The regular tree grammar uses nonterminals $port \in V^\tau$ $p or t \in V^{τ}$ 2 (type 1), $port \in V^\tau$ $p or t \in V^{τ}$ 3 (type 1), $port \in V^\tau$ $p or t \in V^{τ}$ 4 (type 0), with productions:
- $port \in V^\tau$ 5
- $port \in V^\tau$ 6
- $port \in V^\tau$ 7

In a derivation sequence, repeated applications of $port \in V^\tau$ 8 followed by union yield a graph $port \in V^\tau$ 9 (of type 1) with a "b"-node and two parallel $\Sigma_e$ 00-edges targeting a single "a"-node. Here, the arbitrary cloning and merging induced by $\Sigma_e$ 01 enables this reentrancy structure, directly modeling the non-structural scenario within the formalism (Björklund et al., 2021).

GEGs make several advances within the family of generative graph formalisms:

In comparison to hyperedge-replacement grammars, GEGs transcend the context-free model by enabling arbitrary (non-structural) node-sharing.
The extension mechanism is inspired in part by concepts such as adaptive star grammars (notably the "cloning" device) (Björklund et al., 2021).
The framework is directly motivated by requirements from natural language processing, particularly representations such as AMR, which routinely employ non-structural reentrancies.
The computational tractability of parsing distinguishes GEGs among more expressive, but intractable, grammar variants.

Connections to prior work include foundational treatments in graph grammar theory [Courcelle & Engelfriet, 2012], as well as device comparisons with weighted DAG automata for semantic graphs [Chiang et al., 2018].

7. References

Björklund, Henrik; Drewes, Frank; Jonsson, Peter: "Polynomial Graph Parsing with Non-Structural Reentrancies" (Björklund et al., 2021)
Courcelle, B.; Engelfriet, J.: Graph Structure and MSO Logic (2012).
Drewes, F.; et al.: Adaptive Star Grammars (2010).
Chiang, D.; et al.: Weighted DAG Automata for Semantic Graphs (2018).

Markdown Report Issue Upgrade to Chat

References (1)

Polynomial Graph Parsing with Non-Structural Reentrancies (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graph Extension Grammars (GEGs).