Optimal Transport Enhanced Graph Net

Updated 17 September 2025

The paper introduces a novel optimal transport-based attention that replaces linear similarity with a nonlinear, distribution-aware alignment.
OTESGN leverages syntactic graph-aware and semantic optimal transport mechanisms to capture latent token dependencies and fine-grained alignments.
Extensive experiments show improved macro-F1 scores, demonstrating robustness in noisy texts and effective localization of opinion-bearing tokens.

Optimal Transport Enhanced Syntactic-Semantic Graph Network (OTESGN) is an advanced neural architecture designed to address the limitations of conventional aspect-based sentiment analysis (ABSA) by integrating optimal transport–based attention with syntactic and semantic graph modeling. The model improves upon standard dependency-aware and attention-based methods by replacing linear similarity aggregation with a nonlinear, distribution-aware alignment approach and coupling this with adaptive fusion of syntactic and semantic signals. OTESGN is motivated by the need to accurately localize sentiment-carrying opinion terms with high precision and robustness, even in noisy or informal text, by explicitly modeling both latent syntactic dependencies and fine-grained semantic alignments.

1. Motivation and Conceptual Overview

OTESGN targets two core deficiencies in prior ABSA methodologies: dependence on linear dot-product attention (unable to capture nonlinear or complex semantic associations) and overreliance on static syntactic topologies that fail to adapt to context, allowing sentiment-irrelevant tokens to introduce noise. The model innovates by formulating aspect–opinion alignment as a structured matching problem using optimal transport principles, thus inherently enforcing a global, noise-resistant alignment while extracting both local and global syntactic and semantic information from text.

The architecture brings together:

Syntactic Graph-Aware Attention (SGAA): models dependency-informed topologies.
Semantic Optimal Transport Attention (SOTA): employs optimal transport to capture nonlinear, fine-grained token–aspect relationships.
Adaptive Attention Fusion (AAF): dynamically fuses the heterogeneous outputs from syntactic and semantic channels.
Contrastive Regularization: strengthens representation consistency and noise resistance.

2. Syntactic-Semantic Collaborative Attention Mechanism

The Syntactic-Semantic Collaborative Attention (SSCA) layer is the principal innovation of OTESGN, functioning as two parallel but complementary submodules:

Syntactic Graph-Aware Attention (SGAA)

SGAA leverages dependency parse trees, computing for each token pair (w_i, w_j) the shortest-path syntactic distance D(i, j). For each of p attention heads, the network applies a mask matrix M^k with a threshold τ_k so that:

M^k_{ij} = 0 if D(i, j) ≤ τ_k, otherwise –∞.

Each head thus focuses on different levels of syntactic granularity. Attention is computed as:

$A_{SG}^k = \operatorname{softmax}((Q K^\top / \sqrt{d}) + M^k)$

where Q and K are linear projections of the BERT output H^s.

Semantic Optimal Transport Attention (SOTA)

SOTA reframes semantic alignment as an optimal transport problem. For an aspect span of length m, its semantic center is computed as:

$h'_a = \frac{1}{m} \sum_{j=1}^m h_{\mathrm{pos}(a_j)}$

where h_{\mathrm{pos}(a_j)} are token embeddings. For each context token and aspect center, the cosine cost matrix is:

$\operatorname{Cost} = 1 - \frac{H^s (h'_a)^\top}{\|H^s\| \cdot \|h'_a\|}$

For each head, an entropy-regularized (Sinkhorn) kernel is defined:

$K^k = \exp\left(-\operatorname{Cost} / \varepsilon^k\right)$

The Sinkhorn algorithm iterates dual variables (u, v):

$u \leftarrow \mu / (K^k v) \qquad v \leftarrow \nu / (K^k)^\top u$

The resulting transport plan is:

$\pi^k = \operatorname{diag}(u) K^k \operatorname{diag}(v)$

A robustness parameter ε^k is per-head tunable. The plan π^k gives the optimal alignment (row-normalized) between aspect and context, yielding attention weights:

$A_{OT}^k = \pi^k$

3. Adaptive Attention Fusion Strategy

Adaptive fusion in OTESGN is realized through a convex combination:

$A^k = \beta \cdot A_{SG}^k + (1 - \beta) \cdot A_{OT\_mat}^k$

where β is a learnable scalar (initialized to 0.5), and A_{OT_mat}^k extends the per-row weights of OT attention across all columns for matching shape. The final attention is the average across all heads:

$A = \frac{1}{p} \sum_{k=1}^p A^k$

This mechanism enables dynamic, context-sensitive weighting of structural and distributional cues.

4. Contrastive Regularization and Training Objective

OTESGN incorporates a supervised contrastive loss to stabilize learning and enforce sentiment-consistent representations. For K samples in a mini-batch, the contrastive regularization is:

$\mathcal{L}_c(\theta) = -\frac{1}{K} \sum_{i \in \mathcal{I}} \log \frac{\exp(\operatorname{sim}(h^{pool}_{a_i}, h^{pool}_{a_i^+}) / \tau)}{\sum_{j \in \mathcal{I}} \exp(\operatorname{sim}(h^{pool}_{a_i}, h^{pool}_{a_j}) / \tau)}$

where sim(·,·) denotes cosine similarity, τ is a temperature constant, and h^{{pool}_{a_i^+}} are same-polarity positives. This term encourages within-class sample proximity and between-class separation in the learned latent space.

5. Empirical Performance and Analytic Studies

OTESGN achieves state-of-the-art macro-F1 scores on standard ABSA datasets:

Dataset	OTESGN Macro-F1	Best Baseline Macro-F1	Δ Macro-F1
Laptop14	80.52	79.22	+1.30
Twitter	78.17	77.16	+1.01

Beyond quantitative gains, attention heatmap visualizations reveal OTESGN's ability to focus on true opinion-bearing tokens and to suppress noise in informal or syntactically ambiguous contexts.

Extensive ablation demonstrates that:

Removing the syntactic mask (SM) causes pronounced F1 degradation (e.g., –7.13 points on Twitter).
Omitting either the Syntactic Graph-Aware (GA) or the OT (semantic) channel leads to substantial performance drops; eliminating both leads to even greater decline (e.g., –12.43 points on Twitter).
Excluding the contrastive loss also negatively impacts performance, confirming its importance for robustness under noise.

6. Implications for Aspect–Opinion Localization and Robustness

OTESGN's architecture, especially the semantic OT module, directly addresses limitations of linear aggregation by ensuring that alignment between aspect and context is resistant to spurious similarities, as the transport optimization selectively weighs and sparsifies token associations according to global compatibility. The adaptive fusion of syntactic and semantic channels allows the model to favor structural or contextual cues as needed, making it especially effective in informal or ambiguous settings.

Contrastive regularization additionally imposes strong consistency, leading to superior localization of opinion terms and stable sentiment detection in challenging, noisy texts.

7. Outlook and Future Directions

While OTESGN demonstrates strong empirical effectiveness and interpretability, areas for further development include reducing the computational cost of the OT solver (currently dominated by repeated Sinkhorn-like procedures), incorporating dynamic extraction of syntactic priors, and integrating event or knowledge-based signals to further improve handling of implicit sentiment.

OTESGN offers a template for future models seeking to robustly integrate global structural and fine-grained semantic alignment, with its methodology—such as the optimal transport–driven attention fusion—readily extensible to related applications in structured NLP and cross-modal problems.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Optimal Transport Enhanced Syntactic-Semantic Graph Network (OTESGN).