Graph Extension Grammars (GEGs)
- Graph Extension Grammars are a formalism using graph algebras and regular tree grammars to specify and parse labeled, directed graphs with non-structural reentrancies.
- They combine binary union and unary extension operations to enable arbitrary node-sharing, crucial for modeling phenomena in semantic representations such as AMR.
- GEGs offer a tractable, polynomial-time parsing algorithm that employs memoization and profile matching to efficiently fuse graph components during derivation.
Graph Extension Grammars (GEGs) are a formalism introduced to address the expressive and computational requirements of generating and parsing languages of directed, node- and edge-labelled graphs, particularly supporting non-structural reentrancies as found in semantic representations such as Abstract Meaning Representation (AMR). A GEG consists of an algebra over graphs—incorporating specific graph operations—and a regular tree grammar that generates expressions over these operations. The resulting framework allows specification of sets of graphs with both rich structural constraints and tractable polynomial-time parsing (Björklund et al., 2021).
1. Formal Structure of Graph Extension Grammars
A GEG is fundamentally composed of two interlocked components: (i) a graph-extension algebra specifying operations over families of typed, labelled graphs, and (ii) a regular tree grammar generating terms over those operations. The graph domains are defined as follows:
Let be a finite set of node-labels and a finite set of edge-labels. For each nonnegative integer , is the set of all finite, directed, node- & edge-labelled graphs with , where:
- is a finite set of nodes,
- ,
- ,
- is a sequence of 0 (possibly repeating) nodes called the ports.
The union 1 collects all such graphs, parameterized by type 2.
Operations in the Signature
The signature 3 comprises two key forms:
- Binary union operations 4, which yield disjoint unions (up to renaming) and concatenate ports.
- Unary extension operations 5 specified by tuples
6
where 7 is a graph of type 8; 9 is a sequence of 0 dock-nodes; 1 is the set of "clonable" context nodes. These are subject to: - (R1) All edge sources in 2 are from 3, - (R2) Every non-port node in 4 is a target of some edge in 5.
Extension 6 is of type 7 where 8, 9. The operation 0 acts nondeterministically by attaching new structure and enabling node-fusion as described below.
The algebra is completed by including the constant operator for the empty graph 1 of type 0.
Algebraic Semantics
A graph-extension algebra is a many-sorted algebra
2
with 3 and the above-specified operations, producing sets of graphs. Reachability is enforced: every node of every generated graph is reachable from some port-node.
2. Regular Tree Grammar Mechanism
A GEG employs a regular, many-sorted tree grammar 4 with finitely many nonterminals 5, each assigned a type 6. Productions are of the form
7
where 8 has type 9. The generated tree language 0 encodes valid sequences of operations.
A full GEG is then
1
with associated graph language
2
obtained by recursively mapping derivation trees to their semantic value in the algebra.
Evaluation of 3 proceeds by induction:
- Base: If 4, 5.
- Union: If 6, then 7.
- Extension: If 8 with 9 of type 0, then 1.
3. Non-Structural Reentrancy Modelling
GEGs natively support non-structural reentrancy, a key limitation of prior devices like hyperedge-replacement grammars where only context-free attachments are possible. Non-structural reentrancy refers to allowing nodes in a partially constructed graph to appear multiple times as targets of newly introduced edges, a necessity for capturing phenomena such as shared arguments or referents in AMR.
This is realized in the GEG formalism via the set 2 of clonable context-nodes within each unary extension operation 3. When 4 is applied:
- Each 5 can be cloned an arbitrary number of times (non-deterministically).
- Each clone is injectively merged ("fused") to a distinct node of 6 with the same node-label.
- This process enables one or several new outgoing edges from operations to point to the same node(s) in 7, thus creating arbitrary non-structural reentrancies.
The use of 8 thus controls and enables the expressive generativity over non-structural reentrancies—an essential feature missing from many prior graph grammar formalisms.
4. Parsing Algorithms and Complexity
The challenge of parsing a given graph 9 with respect to a GEG 0 is to decide whether 1. This is determined via the existence of a derivation tree 2 and an assignment of semantic values yielding exactly 3. GEGs provide a polynomial-time parsing algorithm, summarized as follows:
- The principal recursive routine 4 tests whether the induced subgraph from port-sequence 5 in 6 is derivable by nonterminal 7.
- Results are memoized in a table 8.
- Union productions and extension rules are handled with explicit splitting or matching logic.
- Matching for extension rules involves fusing dock-nodes to the graph at 9 and considering all ways context-clones may merge, while preserving labels and the "clone only clonable" constraint.
Data Structures and Efficiency
Key data structures include:
- Profiles 0 for nodes in 1,
- Likewise, 2 for nodes in the operation's underlying graph, to efficiently check compatibility during matching and merging.
Complexity Summary
With 3 and 4, the algorithm executes in time 5, primarily determined by the combinatorics of ports and clones. Under additional unambiguity constraints on profiles, complexity can be reduced to 6 or even linear time, as per corollaries in the primary reference (Björklund et al., 2021).
| Parsing Step | Complexity | Key Limitation/Optimization |
|---|---|---|
| Memo table computation | 7 | |
| Union production handling | 9 | Constant number per nonterminal |
| Extension handling (naive) | 0 | Filters via profile comparison |
| Overall (optimized) | 1 | Profile constraints |
5. Representative Example: Reentrant Graph Generation
Consider a minimal GEG showcasing non-structural reentrancy:
- Node labels: 2,
- Edge labels: 3,
- Extension op 4 of type 5:
- 6,
- 7, 8, 9,
- 0,
- 1.
- The regular tree grammar uses nonterminals 2 (type 1), 3 (type 1), 4 (type 0), with productions:
- 5
- 6
- 7
In a derivation sequence, repeated applications of 8 followed by union yield a graph 9 (of type 1) with a "b"-node and two parallel 00-edges targeting a single "a"-node. Here, the arbitrary cloning and merging induced by 01 enables this reentrancy structure, directly modeling the non-structural scenario within the formalism (Björklund et al., 2021).
6. Position within Generative Graph Formalisms and Related Work
GEGs make several advances within the family of generative graph formalisms:
- In comparison to hyperedge-replacement grammars, GEGs transcend the context-free model by enabling arbitrary (non-structural) node-sharing.
- The extension mechanism is inspired in part by concepts such as adaptive star grammars (notably the "cloning" device) (Björklund et al., 2021).
- The framework is directly motivated by requirements from natural language processing, particularly representations such as AMR, which routinely employ non-structural reentrancies.
- The computational tractability of parsing distinguishes GEGs among more expressive, but intractable, grammar variants.
Connections to prior work include foundational treatments in graph grammar theory [Courcelle & Engelfriet, 2012], as well as device comparisons with weighted DAG automata for semantic graphs [Chiang et al., 2018].
7. References
- Björklund, Henrik; Drewes, Frank; Jonsson, Peter: "Polynomial Graph Parsing with Non-Structural Reentrancies" (Björklund et al., 2021)
- Courcelle, B.; Engelfriet, J.: Graph Structure and MSO Logic (2012).
- Drewes, F.; et al.: Adaptive Star Grammars (2010).
- Chiang, D.; et al.: Weighted DAG Automata for Semantic Graphs (2018).