Equity2Vec: Graph Embeddings for Financial News

Updated 5 January 2026

Equity2Vec is a deep graph-based framework that fuses financial news and temporal price data for actionable equity embeddings.
The framework constructs heterogeneous graphs linking companies, industries, and news articles to capture rich relational and cross-sectional dynamics.
Graph neural architectures like GraphSAGE, GCN auto-encoders, and attention models enable tasks such as prediction, clustering, risk modeling, and portfolio optimization.

Equity2Vec (News Graph) encompasses a family of deep graph-based frameworks that generate embeddings for equities by leveraging both the co-occurrence structure of financial news and the temporal dynamics of time-series price and news content. These models encode relational and heterogeneous information about companies exposed through news articles, price histories, and cross-sectional relationships, providing vectorial representations (embeddings) amenable to various downstream financial tasks such as prediction, clustering, and portfolio construction. Major instantiations include unsupervised graph auto-encoders, hybrid graph neural networks integrating language and temporal encoders, and graph-attention models for cross-sectional asset pricing.

1. Heterogeneous and Homogeneous News-Equity Graph Construction

All Equity2Vec variants construct a graph structure where vertices represent financial entities and related objects, and edges capture relational semantics extracted from the news corpus.

Key equity–news graph construction protocols:

Heterogeneous graphs (Sadek et al., 9 Dec 2025): Nodes are companies, industries, and articles.
- Edges encode relations such as company–industry membership, article–main-company (central mention), article–mentioned-company (incidental mention), and, optionally, article–article similarity based on news embedding cosine similarity above a specified threshold.
Co-occurrence (homogeneous) graphs (Turner, 2021, Wu et al., 2019): Vertices are companies; edges are weighted by the number of shared news mentions over a window, optionally normalized and thresholded. Alternative edge weights include exponential decay of occurrences and filtering by temporal window or edge threshold.

Graph Construction Table

Paper	Node Types	Edge Types
(Sadek et al., 9 Dec 2025)	Company, Industry, Article	Company–Industry, Article→Main-Company, Article→Mentioned-Company, Article–Article
(Turner, 2021)	Company	Co-occurrence (shared news mention), self-loops
(Wu et al., 2019)	Company	Temporal co-occurrence (news in a w-day window)

In all variants, the adjacency matrix is normalized or aggregated for efficient message propagation.

2. Node Feature Embedding and Initial Representations

Node feature construction aligns with node semantics and task objectives:

Company nodes
- (Sadek et al., 9 Dec 2025): Concatenation of a 2-layer LSTM encoding of the 15-day historical close-price sequence and a trainable company identifier embedding, yielding a 128-dimensional feature.
- (Turner, 2021): Stock time-series returns (winsorized, standardized previous-close returns) form the feature matrix.
- (Wu et al., 2019): Concatenation of a “static” company embedding (from factorization of the global news co-occurrence matrix), day-specific mean word2vec (“news vector”) features, optional sentiment score, and technical indicators (RSI, EMA, etc.).
Article nodes
- (Sadek et al., 9 Dec 2025): Output of the Sigma Transformer [CLS]-token for headline or article text, 768-dimensional.
Industry nodes (Sadek et al., 9 Dec 2025): Trainable 64-dimensional vectors per industry.

3. Graph Neural Architectures and Data Fusion

Graph neural frameworks process the aforementioned graphs and features variably:

GraphSAGE Hybrid Networks (Sadek et al., 9 Dec 2025):
- Stacks $K=2$ GraphSAGE layers using mean aggregation.
- Each layer computes the mean of neighboring nodes’ embeddings and merges with the current node's embedding, followed by a ReLU or Tanh.
- Final company embeddings are used for multi-task prediction.
GCN Auto-Encoders (Turner, 2021):
- Two-layer GCN encoder produces 8-dimensional embeddings from vertex features and symmetrically normalized adjacency.
- Decoder reconstructs the adjacency via inner product and sigmoid activation.
- Self-loops and batch normalization are incorporated for stability.
Graph Attention and Static Propagation (Wu et al., 2019):
- Day-wise construction: Each node aggregates information from its $k$ heaviest temporal neighbors using attention coefficients from a GAT-style mechanism.
- The resulting cross-sectional embedding is concatenated with daily news vectors and technical features, then processed with an LSTM and temporal attention for time-series modeling.

These architectures support unsupervised (reconstruction or clustering) and supervised (forecasting, classification) paradigms.

4. Learning Objectives, Training Protocols, and Hyperparameter Regimes

Losses and Optimization:
- Supervised bi-task (Sadek et al., 9 Dec 2025): Binary cross-entropy over directional movement and “significant-up” price change, summed as the total loss.
- Unsupervised auto-encoding (Turner, 2021): Binary cross-entropy between real and reconstructed adjacency matrices (edge prediction), with $L_2$ regularization.
- Hybrid (Wu et al., 2019): Static-embedding matrix factorization loss for the co-occurrence matrix, plus MSE between predicted and realized returns in the LSTM stage.
Hyperparameters:
- GraphSAGE: Embedding dims: historical (64), company (64), industry (64), news (768), GNN hidden (128), $K=2$ layers, batch size 256, AdamW optimizer with $1e^{-4}$ learning rate (Sadek et al., 9 Dec 2025).
- GCN Auto-encoder: Hidden dims (64,8), learning rate 0.01, $L_2=0.005$ , trained up to 300 epochs with early stopping, Adam optimizer (Turner, 2021).
- Attention models: Embedding dim $d$ (32–256), temporal window $w$ (2–60), LSTM hidden (2–20), Adam optimizer with learning rate $\in$ {0.001, 0.01}, batch size $\in$ {128, 256} (Wu et al., 2019).

Models are typically implemented in PyTorch Geometric, DGL, or similar frameworks.

5. Experimental Findings and Benchmarking

Prediction performance (Sadek et al., 9 Dec 2025):
- GraphSAGE GNN outperforms LSTM baseline on binary direction (53% vs. 52%) and significant-up tasks (precision 0.55 vs. 0.51) on the US equities dataset (325 firms, 82k articles).
- Graph density positively correlates with company-level prediction accuracy; more “article→company” edges yield higher accuracy.
- Sigma Transformer outperforms FinBERT for news encoding by ~0.5% on both targets; headlines alone provide stronger cues than full articles.
Clustering and unsupervised benchmarks (Turner, 2021):
- Jointly modeling news co-occurrence and return features improves cluster purity (64%) and NMI (47%) over single-stream or featureless variants.
- Edge-prediction precision of 78% on held-out test edges.
- Clustering with $k$ -means on Equity2Vec embeddings aligns with Bloomberg sector labels better than alternatives.
Temporal cross-sectional modeling (Wu et al., 2019):
- Inclusion of static embeddings, daily news features, and attention-based cross-stock propagation yields higher out-of-sample forecasting accuracy than non-graph or non-integrated models.
- Time-wise retraining, hyperparameter cross-validation, and multi-module pipelines are documented for practical trading applications.

6. Downstream Tasks and Financial Applications

Equity2Vec embeddings serve multiple quantitative finance applications:

Risk Modeling: Cosine or Euclidean distances in embedding space identify diversified or similar equities (Turner, 2021).
Clustering and Sector Discovery: Unsupervised clustering of embeddings reconstructs sectoral structure; purity and NMI compared with industry labels and conventional spectral clustering benchmarks (Turner, 2021).
Forecasting and Classification: Company-level embeddings feed into time-series models for price-direction, volatility, and other target variables (Sadek et al., 9 Dec 2025, Wu et al., 2019).
Portfolio Construction: Cluster-based or embedding-driven grouping directly informs capital allocation strategies.
Anomaly Detection: Distance from historic cluster centers flags atypical equity behaviors or market regimes (Turner, 2021).

7. Comparative Analysis and Insights

Empirical analyses report that multi-modal fusion of relational (news), temporal (price), and text-based (headline embeddings) information delivers statistically significant gains over baselines. GraphSAGE mean aggregation is favored in heterophilic, multi-type news–equity graphs over GAT architectures (Sadek et al., 9 Dec 2025). The efficacy of concise news representations (headlines) is substantiated, supporting the use of distilled textual features in short-term financial forecasting (Sadek et al., 9 Dec 2025).

A plausible implication is that model sophistication (heterogeneous graphs, advanced NLP encoders) yields diminishing returns beyond certain data density and feature thresholds, as the architecture–data interplay is a performance bottleneck. Further, news density and the structure of cross-sectional information flow (e.g., co-mention versus direct mapping) critically mediate embedding quality and downstream task performance.

References:

(Sadek et al., 9 Dec 2025, Turner, 2021, Wu et al., 2019)