Equity2Vec: Graph Embeddings for Financial News
- Equity2Vec is a deep graph-based framework that fuses financial news and temporal price data for actionable equity embeddings.
- The framework constructs heterogeneous graphs linking companies, industries, and news articles to capture rich relational and cross-sectional dynamics.
- Graph neural architectures like GraphSAGE, GCN auto-encoders, and attention models enable tasks such as prediction, clustering, risk modeling, and portfolio optimization.
Equity2Vec (News Graph) encompasses a family of deep graph-based frameworks that generate embeddings for equities by leveraging both the co-occurrence structure of financial news and the temporal dynamics of time-series price and news content. These models encode relational and heterogeneous information about companies exposed through news articles, price histories, and cross-sectional relationships, providing vectorial representations (embeddings) amenable to various downstream financial tasks such as prediction, clustering, and portfolio construction. Major instantiations include unsupervised graph auto-encoders, hybrid graph neural networks integrating language and temporal encoders, and graph-attention models for cross-sectional asset pricing.
1. Heterogeneous and Homogeneous News-Equity Graph Construction
All Equity2Vec variants construct a graph structure where vertices represent financial entities and related objects, and edges capture relational semantics extracted from the news corpus.
Key equity–news graph construction protocols:
- Heterogeneous graphs (Sadek et al., 9 Dec 2025): Nodes are companies, industries, and articles.
- Edges encode relations such as company–industry membership, article–main-company (central mention), article–mentioned-company (incidental mention), and, optionally, article–article similarity based on news embedding cosine similarity above a specified threshold.
- Co-occurrence (homogeneous) graphs (Turner, 2021, Wu et al., 2019): Vertices are companies; edges are weighted by the number of shared news mentions over a window, optionally normalized and thresholded. Alternative edge weights include exponential decay of occurrences and filtering by temporal window or edge threshold.
Graph Construction Table
| Paper | Node Types | Edge Types |
|---|---|---|
| (Sadek et al., 9 Dec 2025) | Company, Industry, Article | Company–Industry, Article→Main-Company, Article→Mentioned-Company, Article–Article |
| (Turner, 2021) | Company | Co-occurrence (shared news mention), self-loops |
| (Wu et al., 2019) | Company | Temporal co-occurrence (news in a w-day window) |
In all variants, the adjacency matrix is normalized or aggregated for efficient message propagation.
2. Node Feature Embedding and Initial Representations
Node feature construction aligns with node semantics and task objectives:
- Company nodes
- (Sadek et al., 9 Dec 2025): Concatenation of a 2-layer LSTM encoding of the 15-day historical close-price sequence and a trainable company identifier embedding, yielding a 128-dimensional feature.
- (Turner, 2021): Stock time-series returns (winsorized, standardized previous-close returns) form the feature matrix.
- (Wu et al., 2019): Concatenation of a “static” company embedding (from factorization of the global news co-occurrence matrix), day-specific mean word2vec (“news vector”) features, optional sentiment score, and technical indicators (RSI, EMA, etc.).
- Article nodes
- (Sadek et al., 9 Dec 2025): Output of the Sigma Transformer [CLS]-token for headline or article text, 768-dimensional.
- Industry nodes (Sadek et al., 9 Dec 2025): Trainable 64-dimensional vectors per industry.
3. Graph Neural Architectures and Data Fusion
Graph neural frameworks process the aforementioned graphs and features variably:
- GraphSAGE Hybrid Networks (Sadek et al., 9 Dec 2025):
- Stacks GraphSAGE layers using mean aggregation.
- Each layer computes the mean of neighboring nodes’ embeddings and merges with the current node's embedding, followed by a ReLU or Tanh.
- Final company embeddings are used for multi-task prediction.
- GCN Auto-Encoders (Turner, 2021):
- Two-layer GCN encoder produces 8-dimensional embeddings from vertex features and symmetrically normalized adjacency.
- Decoder reconstructs the adjacency via inner product and sigmoid activation.
- Self-loops and batch normalization are incorporated for stability.
- Graph Attention and Static Propagation (Wu et al., 2019):
- Day-wise construction: Each node aggregates information from its heaviest temporal neighbors using attention coefficients from a GAT-style mechanism.
- The resulting cross-sectional embedding is concatenated with daily news vectors and technical features, then processed with an LSTM and temporal attention for time-series modeling.
These architectures support unsupervised (reconstruction or clustering) and supervised (forecasting, classification) paradigms.
4. Learning Objectives, Training Protocols, and Hyperparameter Regimes
- Losses and Optimization:
- Supervised bi-task (Sadek et al., 9 Dec 2025): Binary cross-entropy over directional movement and “significant-up” price change, summed as the total loss.
- Unsupervised auto-encoding (Turner, 2021): Binary cross-entropy between real and reconstructed adjacency matrices (edge prediction), with regularization.
- Hybrid (Wu et al., 2019): Static-embedding matrix factorization loss for the co-occurrence matrix, plus MSE between predicted and realized returns in the LSTM stage.
- Hyperparameters:
- GraphSAGE: Embedding dims: historical (64), company (64), industry (64), news (768), GNN hidden (128), layers, batch size 256, AdamW optimizer with learning rate (Sadek et al., 9 Dec 2025).
- GCN Auto-encoder: Hidden dims (64,8), learning rate 0.01, , trained up to 300 epochs with early stopping, Adam optimizer (Turner, 2021).
- Attention models: Embedding dim (32–256), temporal window (2–60), LSTM hidden (2–20), Adam optimizer with learning rate {0.001, 0.01}, batch size {128, 256} (Wu et al., 2019).
Models are typically implemented in PyTorch Geometric, DGL, or similar frameworks.
5. Experimental Findings and Benchmarking
- Prediction performance (Sadek et al., 9 Dec 2025):
- GraphSAGE GNN outperforms LSTM baseline on binary direction (53% vs. 52%) and significant-up tasks (precision 0.55 vs. 0.51) on the US equities dataset (325 firms, 82k articles).
- Graph density positively correlates with company-level prediction accuracy; more “article→company” edges yield higher accuracy.
- Sigma Transformer outperforms FinBERT for news encoding by ~0.5% on both targets; headlines alone provide stronger cues than full articles.
- Clustering and unsupervised benchmarks (Turner, 2021):
- Jointly modeling news co-occurrence and return features improves cluster purity (64%) and NMI (47%) over single-stream or featureless variants.
- Edge-prediction precision of 78% on held-out test edges.
- Clustering with -means on Equity2Vec embeddings aligns with Bloomberg sector labels better than alternatives.
- Temporal cross-sectional modeling (Wu et al., 2019):
- Inclusion of static embeddings, daily news features, and attention-based cross-stock propagation yields higher out-of-sample forecasting accuracy than non-graph or non-integrated models.
- Time-wise retraining, hyperparameter cross-validation, and multi-module pipelines are documented for practical trading applications.
6. Downstream Tasks and Financial Applications
Equity2Vec embeddings serve multiple quantitative finance applications:
- Risk Modeling: Cosine or Euclidean distances in embedding space identify diversified or similar equities (Turner, 2021).
- Clustering and Sector Discovery: Unsupervised clustering of embeddings reconstructs sectoral structure; purity and NMI compared with industry labels and conventional spectral clustering benchmarks (Turner, 2021).
- Forecasting and Classification: Company-level embeddings feed into time-series models for price-direction, volatility, and other target variables (Sadek et al., 9 Dec 2025, Wu et al., 2019).
- Portfolio Construction: Cluster-based or embedding-driven grouping directly informs capital allocation strategies.
- Anomaly Detection: Distance from historic cluster centers flags atypical equity behaviors or market regimes (Turner, 2021).
7. Comparative Analysis and Insights
Empirical analyses report that multi-modal fusion of relational (news), temporal (price), and text-based (headline embeddings) information delivers statistically significant gains over baselines. GraphSAGE mean aggregation is favored in heterophilic, multi-type news–equity graphs over GAT architectures (Sadek et al., 9 Dec 2025). The efficacy of concise news representations (headlines) is substantiated, supporting the use of distilled textual features in short-term financial forecasting (Sadek et al., 9 Dec 2025).
A plausible implication is that model sophistication (heterogeneous graphs, advanced NLP encoders) yields diminishing returns beyond certain data density and feature thresholds, as the architecture–data interplay is a performance bottleneck. Further, news density and the structure of cross-sectional information flow (e.g., co-mention versus direct mapping) critically mediate embedding quality and downstream task performance.
References: