- The paper presents GGATN, a novel model that combines graph-based process structure encoding with Transformer-based sequence modeling to generate full event sequences.
- It enforces structural constraints through a GAT encoder and a Viterbi-style decoder that ensures valid transitions, temporal order, and attribute consistency.
- Empirical evaluations on diverse real-world datasets demonstrate that GGATN outperforms autoregressive LLM baselines by eliminating hallucinations and improving sequence accuracy.
Problem Setting and Motivation
The paper tackles the task of structurally constrained full event sequence generation within Predictive Process Monitoring (PPM). Unlike traditional local prediction tasks—such as next activity, outcome, or attribute prediction—this formulation requires the unconditional generation of the entire future event sequence (activities, timestamps, event-level and sequence-level attributes, explicit termination) conditioned only on high-level cues (e.g., start time and target length), without access to prefix observations. Central constraints include transition feasibility, temporal order, termination, and attribute consistency, reflecting operational realities in structured business, clinical, or manufacturing processes.
The authors identify two central limitations in applying Transformer-based paradigms to full sequence generation in PPM: (1) self-attention mechanisms model position-wise dependencies but do not encode explicit process structure or admissibility constraints; (2) autoregressive decoding accumulates errors, compromising global structural coherence over long horizons.
Model: GGATN Architecture
The presented Graph Grounded Cross Attention Transformer Neural Network (GGATN) hybridizes process graph representations and position-aware sequence modeling. The architectural flow is as follows:
- Graph Construction & GAT Encoder: A global process graph is extracted from the event log, capturing all observed activities and transitions (with edge attribute parameterization for transition frequency and mean temporal gap). A multi-layer Graph Attention Network (GAT) encodes this graph, producing activity embeddings that serve as a structural memory. Training regime ablation (frozen vs. joint vs. staged) demonstrates the stability of embedding semantics as a process prior.
- Sequence Transformer Encoder: Input conditioning (activity, time, attributes, cyclic temporal features, sequence head with normalized length) is processed using a stack of Transformer encoder layers with sinusoidal positional encodings, producing a contextually enriched sequence representation.
- Graph Grounded Cross Attention: At each sequence position, the Transformer representation attends (cross attention) over the GAT-produced global activity embeddings. The gated cross attention output is injected, enhancing sequential context with process-structural information.
- Non-Autoregressive Graph-Constrained Decoding: Sequence generation is non-autoregressive (single pass), producing raw logits for activities, times, and attributes. A Viterbi-style dynamic programming decoder, grounded in the process graph, prunes/optimizes final sequences subject to admissible transitions and explicit termination, thus strictly enforcing global process validity.
- Attribute and Temporal Heads: Event and sequence attributes (categorical, numerical) are predicted from the shared latent, conditioned on decoded activities. Time prediction operates in a transformed (log-standardized) space, regularized by Smooth L1 loss.
- Refinement: An optional activity feedback module reshapes provisional logits by feeding the soft decoded distribution back into the latent, followed by an additional feed-forward refinement step, enhancing correction for ambiguous event assignments.
Evaluation and Results
The evaluation encompasses six real-world event logs spanning a range of process complexity (Helpdesk, Sepsis, BPI13I, BPI13C, BPI20, BPI17), with a comprehensive metric suite: event and sequence-level accuracy, sequence similarity, Damerau-Levenshtein and bigram JSD scores, duration WT, and attribute/timestamp consistency diagnostics.
Strong empirical results demonstrate:
- GGATN achieves full sequence coverage and zero hallucinated activities across all datasets, outperforming local LLM baselines on core flow metrics (sequence similarity, DL, control-flow JSD).
- On complex datasets (e.g., BPI17: 66 activities, long sequences), GGATN achieves robust transition adherence and superior attribute alignment compared to both 8B/7B LLaMA/Mistral LLMs (4k–32k context), which suffer from coverage gaps, instability, and significant computational overheads.
- LLM performance is uneven and heavily dependent on context window and in-context length bucket prompting; gains in sequence-level recall come at the cost of hallucinations and global inconsistency.
- Ablation confirms the stability and utility of the global GAT encoder as a structural prior: performance is robust across training regimes (frozen, staged, joint); fine-tuned adaptation yields marginal improvements only in extremely large/complex graphs.
- Interpretability analysis illustrates that cross attention selectively emphasizes structurally admissible transitions, but does not function as a hard filter—final path validity is enforced by the decoding module. The refinement head demonstrably increases true activity support for provisionally misclassified steps.
Theoretical and Practical Implications
GGATN constitutes a formal blueprint for integrating explicit process topology (via learned graph representations) with self-attentional sequence modeling. The separation of sequence representation, graph memory, and decoding naturally addresses exposure bias and aligns event generation with operational constraints. This architecture introduces a paradigm wherein graph neural modules supply stable, inspectable priors, and decoding enforces global feasibility in high-cardinality, data-rich event prediction domains. The attribute-level multitask formulation further models the interplay of activities, temporality, and resource states as an integrated generative task.
From a practical perspective, GGATN's computational efficiency and reliable constraint enforcement have direct implications for real-time decision support, digital twins, and process simulation in high-stakes environments—where structural violation or event hallucination is intolerable.
Future Directions
Future work could investigate several axes:
- Richer process graph parameterizations: integrating second-order, context-dependent transition dynamics, or stochastic temporal kernels.
- Adaptive graph bucketing: enabling local/global graph priors selected dynamically based on target sequence characteristics, possibly via meta-learning or on-the-fly subgraph extraction.
- Tighter integration of process mining priors and domain semantics for temporal and attribute consistency.
- Extension to more expressive hybrid GNN-Transformer architectures with differentiable decoding.
Conclusion
GGATN delivers state-of-the-art, process-aware unconditional event sequence generation in the PPM domain, consistently outperforming strong LLM baselines in reliability, structural validity, and computational efficiency. Its hybridization of GAT-based process graph representations, Transformer sequence encoders, graph-grounded cross attention, and rigorous decoding enforces both global and local structural correctness. Ablation and interpretability studies underscore the architectural soundness and flexibility of the approach, laying a foundation for future work in generative modeling for structured operational domains.
Citation:
"Graph Grounded Cross Attention Transformer Neural Network for Structurally Constrained Full Event Sequence Generation in Predictive Process Monitoring" (2606.18726)