NewsNet-SDF: Stochastic Discount Factor Estimation with Pretrained Language Model News Embeddings via Adversarial Networks (2505.06864v1)

Published 11 May 2025 in q-fin.PM and cs.LG

Abstract: Stochastic Discount Factor (SDF) models provide a unified framework for asset pricing and risk assessment, yet traditional formulations struggle to incorporate unstructured textual information. We introduce NewsNet-SDF, a novel deep learning framework that seamlessly integrates pretrained LLM embeddings with financial time series through adversarial networks. Our multimodal architecture processes financial news using GTE-multilingual models, extracts temporal patterns from macroeconomic data via LSTM networks, and normalizes firm characteristics, fusing these heterogeneous information sources through an innovative adversarial training mechanism. Our dataset encompasses approximately 2.5 million news articles and 10,000 unique securities, addressing the computational challenges of processing and aligning text data with financial time series. Empirical evaluations on U.S. equity data (1980-2022) demonstrate NewsNet-SDF substantially outperforms alternatives with a Sharpe ratio of 2.80. The model shows a 471% improvement over CAPM, over 200% improvement versus traditional SDF implementations, and a 74% reduction in pricing errors compared to the Fama-French five-factor model. In comprehensive comparisons, our deep learning approach consistently outperforms traditional, modern, and other neural asset pricing models across all key metrics. Ablation studies confirm that text embeddings contribute significantly more to model performance than macroeconomic features, with news-derived principal components ranking among the most influential determinants of SDF dynamics. These results validate the effectiveness of our multimodal deep learning approach in integrating unstructured text with traditional financial data for more accurate asset pricing, providing new insights for digital intelligent decision-making in financial technology.

Summary

The paper presents a novel deep learning framework that fuses pretrained news embeddings with macroeconomic and firm data to estimate stochastic discount factors.
It employs multi-source feature extraction using PLM-based text processing, LSTM for economic indicators, and attention-driven fusion for asset pricing.
Empirical results demonstrate significant improvements with Sharpe Ratios up to 2.80 and lower pricing errors compared to traditional models.

This paper, "NewsNet-SDF: Stochastic Discount Factor Estimation with Pretrained LLM News Embeddings via Adversarial Networks" (2505.06864), introduces a deep learning framework designed to integrate unstructured financial news text with traditional financial data for improved asset pricing. The core idea is to enhance the standard Stochastic Discount Factor (SDF) framework by incorporating information from diverse sources through a multimodal architecture and training it using an adversarial network setup based on the Generalized Method of Moments (GMM).

The fundamental problem addressed is the challenge of leveraging the rich, forward-looking information in financial news text within a theoretically consistent asset pricing model. Traditional SDF models and even many machine learning approaches primarily rely on structured numerical data, missing the timely and nuanced signals present in textual narratives.

NewsNet-SDF proposes a solution with three main components: multi-source feature extraction, an adversarial network for SDF estimation, and an adversarial training mechanism.

Implementation Details and Architecture

Multi-Source Feature Extraction:
- Financial News Text: The framework processes news articles related to each firm.
  - Preprocessing involves cleaning and tokenizing using a WordPiece tokenizer (specifically from GTE-multilingual-base, chosen for its large context window and semantic similarity performance).
  - Pretrained LLM (PLM) Embeddings: Each news sentence is passed through a Transformer encoder (e.g., from GTE-multilingual-base) to generate dense vector embeddings $\mathbf{e}_{t,i,k} \in \mathbb{R}^{768}$ .
  - Attention-Based Aggregation: A learned self-attention mechanism $\alpha_{t,i,k}$ is applied to aggregate multiple sentence embeddings for a given firm at a given time into a single representation $\bar{\mathbf{e}}_{t,i}$ . This weights sentences based on learned importance.
  - Dimensionality Reduction: Principal Component Analysis (PCA) is applied to $\bar{\mathbf{e}}_{t,i}$ to project it to a lower-dimensional news feature vector $\mathbf{N}_{t,i}$ .
- Macroeconomic Data: A set of macroeconomic time series indicators are processed using a unidirectional LSTM network over a rolling window. The LSTM distills the temporal patterns and history into a fixed-length representation $\tilde{\mathbf{I}}_t$ .
- Firm Characteristics: Standard firm-specific characteristics (size, momentum, etc.) are included. These are normalized using cross-sectional ranking to the range $[-1, 1]$ . This results in the feature vector $\hat{\mathbf{F}}_{t,i}$ .
- Feature Fusion: The processed features from all three modalities are concatenated into a single fused feature vector $\mathbf{x}_{t,i} = [\, \tilde{\mathbf{I}}_t \parallel \hat{\mathbf{F}}_{t,i} \parallel \mathbf{N}_{t,i}\, ]$ .
Adversarial Network for SDF Estimation:
- The core of the model is an adversarial network consisting of two parallel neural networks, both taking the fused feature vector $\mathbf{x}_{t,i}$ as input.
- SDF Network ( $f_\phi$ ): A feedforward neural network that maps $\mathbf{x}_{t,i}$ to scalar SDF weights $w_{t,i}$ for each asset $i$ at time $t$ . The SDF $M_{t+1}$ is then constructed as $M_{t+1} = 1 - \sum_{i=1}^N w_{t,i} R^e_{t+1,i}$ , where $R^e_{t+1,i}$ is the excess return of asset $i$ .
- Conditional Network ( $g_\psi$ ): Another feedforward network that maps $\mathbf{x}_{t,i}$ to a vector of instruments $\mathbf{g}_{t,i}$ . These instruments serve to challenge the SDF network by highlighting pricing anomalies.
Adversarial Training Mechanism:
- Training is framed as a minimax game, directly implementing the GMM orthogonality conditions $\mathbb{E}_t[M_{t+1}R_{t+1,i}^e] = 0$ . The objective is to find parameters $\phi$ for the SDF network and $\psi$ for the conditional network that solve:
  
  $\min_{\phi} \max_{\psi} \frac{1}{N} \sum_{j=1}^{N} \left\| \mathbb{E}\left[M_{t+1} R^e_{t+1,j} \, \mathbf{g}_{t,j}\right] \right\|_2^2 + \lambda\left( \|\bm{\phi}\|^2_2 + \|\bm{\psi}\|^2_2 \right)$

* In practice, the expectation is replaced by a sample average over mini-batches of assets and time periods. The training involves alternating gradient updates: * The conditional network parameters $\psi$ are updated via gradient ascent to maximize the pricing errors $\| \mathbb{E}[\dots] \|^2$ , identifying where the current SDF fails. * The SDF network parameters $\phi$ are updated via gradient descent to minimize the pricing errors, learning an SDF that satisfies the moment conditions for the instruments generated by the conditional network.

Practical Applications and Results

The NewsNet-SDF framework was empirically evaluated on U.S. equity data from 1970-2022 (with 2000-2022 as the out-of-sample test set).

Superior Performance: NewsNet-SDF significantly outperforms traditional factor models (CAPM, Fama-French) and other machine learning/text-augmented baselines. It achieves a Sharpe Ratio of 2.80 on the test set, substantially higher than competitors (e.g., 0.49 for CAPM, 0.50 for FF5, 1.56 for IPCA-SDF, 0.65 for GAN-SDF, 1.81 for BERT-SDF). It also achieves the lowest Mean Squared Pricing Error (0.56), indicating better pricing accuracy.
Value of News: Ablation studies show that removing news text features leads to a much larger degradation in performance (41% drop in SR, 377% increase in MSPE) compared to removing macroeconomic features (31% drop in SR, 179% increase in MSPE). This confirms that financial news provides unique and valuable predictive signals beyond traditional numerical data.
Feature Importance: Analysis of feature sensitivity reveals that news-derived principal components are among the most influential features driving the SDF dynamics, alongside traditional firm characteristics and macroeconomic indicators. This validates the effective integration of text information.
Risk Pricing and Predictability: Portfolios sorted based on the model's predicted risk exposure (beta) exhibit a clear, monotonic relationship with realized returns. Higher predicted beta portfolios consistently earn higher returns, confirming that the learned SDF effectively captures systematic risk that is priced in the market.
Robustness in Volatility: During the COVID-19 pandemic, the model demonstrated the ability to adaptively reweight the importance of news across industries (e.g., high importance for healthcare news early on). News narratives were shown to precede shifts in traditional metrics by 2-3 weeks, highlighting the anticipatory power of text data captured by the model. NewsNet-SDF reduced pricing errors by 18-32% during market transitions compared to text-agnostic models.

Implementation Considerations

Data Processing Pipeline: Implementing the multi-modal feature extraction requires robust pipelines for handling large volumes of text data, aligning it temporally with structured financial data, and performing efficient processing steps like PLM inference, attention aggregation, and PCA. The paper mentions processing 2.5 million news articles, implying significant computational resources for embedding generation.
PLM Choice: The choice of GTE-multilingual-base is motivated by practical benefits like context length and semantic similarity performance, suggesting that careful selection of the base PLM is crucial and potentially domain-specific models (like FinBERT) could be explored depending on data availability and task.
Adversarial Training: Implementing the alternating gradient steps for the minimax optimization requires careful tuning of learning rates and training schedules for both the SDF and Conditional networks. The formulation relies on computing gradients of the sample moments with respect to network parameters.
Computational Resources: Training deep learning models on large-scale, high-dimensional financial data, especially with PLMs, is computationally intensive, requiring significant GPU resources.
Temporal Alignment: Ensuring that news articles are processed before the returns they are intended to predict are observed is critical to avoid look-ahead bias, as described in the paper's data splitting strategy.
Generalization: While evaluated on U.S. equities and English news, extending the model to international markets or alternative data sources would require access to corresponding data and potentially multilingual or domain-specific PLMs.

In summary, NewsNet-SDF offers a practical deep learning approach to asset pricing by successfully integrating financial news embeddings using pretrained LLMs within a theoretically grounded adversarial SDF framework. Its strong empirical performance, particularly during periods of market stress, highlights the value of unstructured text data and multimodal learning for financial decision-making and risk management in practice.

PDF Markdown

Tweets

https://twitter.com/QFinancePapers/status/1922199308608123019