- The paper introduces SEAGAN, a domain-specific GNN that leverages edge-aware attention to enhance node classification on photosynthetic response curves.
- It employs both kNN and auxiliary-signal-guided graph strategies to robustly partition ambiguous biochemical regimes using domain-infused signals.
- Quantitative results show that the GAT-kNN variant achieved a macro-F1 of 0.857 and an accuracy of 0.882, outperforming traditional and feature-based baselines.
SEAGAN: Domain-Specific, Edge-Aware Graph Attention for Node Classification in Dynamic Plant Physiology
Introduction
This work introduces SEAGAN (domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes), a novel GNN architecture tailored for node-wise classification of photosynthetic biochemical limitation states along C3 A–Ci response curves. The identification of these limitation regimes—Rubisco-limited (Ac​), RuBP-regeneration-limited (Aj​), and TPU-limited (Ap​)—represents the central challenge in inferring photosynthetic parameters from gas exchange data. Traditional heuristic or automated fitting methods often yield unreliable or ambiguous assignments, particularly near biochemical transition zones, creating instability in downstream mechanistic parameter estimation. This study reconceptualizes curve analysis as a relational learning problem, leveraging structured graph representations, domain-specific auxiliary signals, and edge-aware attention mechanisms.
Domain-Specific Graph Construction Strategies
The paper formulates each A–Ci curve as a small graph where nodes correspond to discrete measurement points, and edges encode pairwise physiological or geometric relationships. Two graph construction frameworks are compared:
- Distance-based kNN graphs: Nodes connect to their four nearest neighbors in the Ci​–Anet​ space.
- Auxiliary-Signal-Guided (ASG) graphs: Nodes are partitioned into functional groups based on peak detection in carefully derived auxiliary diagnostic signals, and connections are fully assigned within groups with boundary links across transitions.
Auxiliary signals (sAc​ and sAj​), derived by normalizing measured Anet​ with dimensionless process-specific response terms, provide increased local diagnostic resolution and structurally inform both node features and edge attributes.
Figure 2: The auxiliary signals sAc​ and sAj​ stratify local curve behavior, enabling refined grouping into limitation regimes.
Figure 1: Comparison of kNN (proximity-based) and ASG (biochemically-informed) graph connectivities for A–Ci curves.
Model Architectures and Training Framework
Three principal GNN variants are evaluated—GCN, GAT (graph attention network with edge attributes), and Graph U-Net (with hierarchical pooling/unpooling and edge-aware variants). Each model ingests node features (including both directly observed and auxiliary signals) and adopts either kNN or ASG connectivity, with edge attributes based on pairwise differences in auxiliary signals. Node-wise multi-class classification is learned via cross-entropy (with balanced, weighted, and focal adaptations to address label imbalance).
The workflow involves the automated generation of 10,000 synthetic A–Ci curves spanning the physiological range (parameterized via LHS), realistic noise injection, and ground-truth regime annotation (enabling quantification of classification fidelity). Feature-based ML and neural baselines (RF, SVM, XGBoost, FF-NN) serve as controls.
Figure 3: End-to-end workflow: synthetic data generation, graph construction, model development, and evaluation.

Figure 4: Model families—GCN baseline, GAT with edge-aware attention, and Graph U-Net—used for limitation-state node classification.
Quantitative Evaluation and Comparative Results
Feature-based models failed to accurately resolve node-wise regime assignments, especially in transition regions (macro-F1 Aj​0). The introduction of GCNs with local message passing provided a statistically significant boost (macro-F1 Aj​1), elucidating the benefit of relational context. However, GCNs also manifested deficiencies in capturing ambiguous nodes near regime boundaries, limiting discrimination power.
SEAGAN—implemented as GAT-kNN with edge attributes and trained under weighted cross-entropy—produced the strongest results, with macro-F1 Aj​2, recall Aj​3, precision Aj​4 and accuracy Aj​5. All GAT variants with learned edge-aware attention substantially outperformed both the GCN baseline and an automated fitting-based reference (PhoTorch, F1 Aj​6), with statistically significant gains confirmed across 30 repeated trials.
Figure 5: F1-score distributions across feature-based and GCN baselines demonstrate clear performance separation.
The edge attribution and ablation analyses (using GNNExplainer) demonstrated that adaptive attention focuses on relevant transition-spanning neighbors, enabling robust disambiguation that is not available to convolution-based or group-partitioned (ASG) models.
Theoretical and Practical Implications
The critical advancement is the empirical demonstration that representing physiological response curves as explicit graphs—with process-specific node features, edge attributes, and learnable local attention—yielded marked improvements in node-wise state segmentation over both fixed feature-aggregation and purely group-partitioned ASG graph strategies. The results confirm that, for small relational systems with ambiguous transition zones and limited data, flexible local aggregation (as implemented in kNN edge-aware GATs) outperforms pre-imposed group clustering (ASG) and hierarchical pooling (Graph U-Net).
Practically, this enables more reliable, scalable, and automated biochemical regime assignment for photosynthetic parameter estimation, directly supporting high-throughput plant functional phenomics and reducing the dependence on subjective manual curation or brittle fitting heuristics.
Implications for Relational Machine Learning
From the AI perspective, SEAGAN demonstrates that domain-infused auxiliary signals, local geometric connectivities, and edge-conditioned attention yield architectures robust to noisy and ambiguous scientific measurements. This framework serves as a model for relational learning problems involving small, structured, and highly contextual scientific datasets, bridging the gap between physical process models and deep GNNs. The results support further research into adaptive graph construction, process-informed edge attributes, and uncertainty quantification for parameter estimation.
Conclusion
SEAGAN establishes a new high-water mark for node-wise photosynthetic limitation-state classification in A–Ci curves by integrating domain-specific auxiliary features, edge-aware attention, and judicious local connectivities. kNN-based GATs outperformed both feature-based and convolutional graph baselines, as well as classical and contemporary fitting algorithms, marking a significant advance in automated, scalable plant phenotyping. The architecture and methodology provide a robust template for future work in graph-based biochemical parameter inference, uncertainty propagation, and broader scientific-relational learning. Future extensions should integrate direct parameter estimation and expand to empirical datasets with more extreme measurement regimes.