Investor Knowledge Representations

Updated 23 December 2025

Investor knowledge representations are formal models that encode investor beliefs and cognitive processes using structured frameworks like knowledge graphs, probabilistic, and logical models.
They support rigorous financial analysis by integrating multi-modal data such as texts, tables, events, and expert opinions for applications like trend detection and portfolio optimization.
These representations fuse empirical, expert, and model-based insights, leveraging techniques including vector embeddings and dynamic feature fusion to drive automated decision making.

Investor knowledge representations formally model, encode, and operationalize the beliefs, information states, and cognitive processes of investors, enabling both quantitative analysis and decision automation in finance. These representations span structured knowledge graphs, probabilistic and logical frameworks, embedding-based vector encodings, and dynamic fusion mechanisms—each integrating heterogeneous data (text, tables, events, news, expert judgments) and supporting rigorous reasoning or prediction. Approaches vary from algebraic belief frameworks to LLM–driven extraction and graph-based retrieval, as well as principled fusion of empirical, expert, and model-based knowledge.

1. Knowledge Graph–Based Representations

Structured knowledge graphs are central to modern investor knowledge engineering, capturing both factual and contextual information relevant to financial decision-making.

QuantMind defines the formal backbone for investor knowledge as a typed, attributed, and provenance-annotated knowledge graph $G = (V, E, \tau, \rho, \alpha)$ , where:

$V$ : nodes representing entities (e.g., companies, metrics, events) and literals (quantities, temporal markers),
$E \subseteq V \times R \times V$ : directed, labeled edges with relation type $R$ ,
$\tau: V \rightarrow T$ : node typing,
$\alpha: V \rightarrow \Omega$ : attribute dictionaries (e.g., time-stamps, currency),
$\rho: E \rightarrow \{1, ..., n\}$ : edge-level provenance to support auditability.

Extraction leverages multi-modal operators (text, tables, formulas), with LLMs or open IE yielding structured triples. Tagging and embedding models then index nodes/triples into a searchable space driven by a financial taxonomy (e.g., FactorModel, RiskMetric). Retrieval integrates semantic similarity, domain-aware tags, and multi-hop reasoning via graph paths scored by a combination of embedding similarity and provenance confidence, supporting both point-in-time lookups and complex, auditable research (Wang et al., 25 Sep 2025).

FinReflectKG extends these paradigms with agentic, reflection-based pipelines for large-scale KG construction from regulated sources (e.g., SEC 10-Ks), enforcing a business-centric schema with SME-validated entity/relation types. A critic–corrector feedback loop ensures schema compliance and business relevance; triples are further validated with rule-based and LLM-as-a-Judge metrics to maximize both precision and coverage, supporting advanced analytics and investor queries spanning risk, ESG, and scenario dimensions (Arun et al., 25 Aug 2025).

FinDKG demonstrates dynamic knowledge graph generation from financial news, with LLMs (ICKG) extracting timestamped quintuples (entity types, relations) and a GNN-based KGTransformer encoding temporal and structural node representations. This enables link prediction, trend detection, and thematic portfolio construction via learned investor-relevant signals (e.g., centralities, “AI impact” link scores), directly operationalizing extracted knowledge for investment decisions (Li et al., 2024).

2. Probabilistic and Logical Representations

Linear Belief Functions (LBFs), as developed in (Liu et al., 2012), offer a general framework for financial knowledge under uncertainty:

Each belief function encodes constraints/brackets on a vector of continuous variables $X$ $X$ via moment matrices, unifying:
- Statistical observations: $M = \left( x \; 0 \right)$ ,
- Distributional assumptions: $M = (\mu \Sigma^{-1}\; -\Sigma^{-1})$ ,
- Subjective speculations: $M = (\mu_{\mathrm{spec}} \sigma_{\mathrm{spec}}^{-2}\; -\sigma_{\mathrm{spec}}^{-2})$ ,
- Linear relations or factor models,
- Ignorance (vacuous) as $\Sigma^{-1}\to 0$ .
Computationally, Dempster’s rule combines beliefs by addition of the swept (canonical) parameters, supporting sequential updating as new market data (hard or soft) arrives.
The framework accommodates joint integration of empirical, expert-driven, and model-based sources, supporting posterior inference and dynamic portfolio evaluation.

Epistemic Modal Logic, as formulated in the IE framework (Adachi, 2015), represents investor belief and knowledge in a time-indexed, stochastic context:

Syntax comprises terms naming stochastic processes and modal formulae for knowledge ( $K_i\,\varphi$ ), belief ( $B_i\,\varphi$ ), and their common versions ( $CK_G, CB_G$ ).
Semantics are given by conditional expectations under sub– $\sigma$ -algebras, mapping formulae to $[0,1]$ -valued stochastic processes.
Belief operators and group modalities capture not only idiosyncratic and aggregate knowledge but also the fixed-point structures of consensus (e.g., crisis belief thresholds).
This supports fine-grained modeling of financial events such as two-agent disagreement, common belief in default, or crash anticipation.

3. Embedding and Representation Learning Approaches

Investor preferences and cognitive patterns are increasingly represented by high-dimensional vector encodings, supporting matching, search, and explainability.

Investor–Company Embeddings (Kaur et al., 2021) are built by:

Concatenating attribute-level components (funding status, Transformer/BERT encodings of descriptions, industries, locations).
Hybrid matching functions: content-based similarity (cosine between company vectors, with historical backing) and collaborative similarity (SVD decomposition of the investor–company link matrix), with thresholded combination.
Parameterized template-based explanations enable transparent match justification, facilitating adoption in regulated environments.

External Knowledge Graph Embeddings for Stock Prediction (Dukkipati et al., 14 Apr 2025) use timestamped, multi-relational KGs incorporating stocks, macro indicators, events, and sentiment:

Dynamic node representations are learned via a Hawkes-process-augmented GNN (HPGE + TA-HKGE), modeling excitation in event sequences.
Final stock embedding $H_i^T$ combines sequence dynamics (Transformer over prices), event-process embeddings, and graph structure, driving ranking and prediction.
Ablation studies confirm critical value of KG knowledge infusion, yielding higher IRR, risk-adjusted performance, and ranking accuracy.

4. Cognitive and Multi-Source Feature Representations

Dynamic Stacking and Feature Fusion approaches (Gao et al., 16 Dec 2025) formalize how rational, heterogeneous investors process multi-source information:

Investor knowledge representations are defined as low-dimensional feature vectors extracted by specialist sub-networks:
- Multi-Branch CNN for global indices (regional context),
- Spectral-clustered CNN for industry indices (sector rotation),
- RNN/Evidential Reasoning for news (per-provider reliability).
Stage-2 dynamic stacking ensembles select among meta-classifiers (e.g., LR, SVM, RF, ANN) per rolling window, adapting to temporally changing regimes and investor focus.
Ablation confirms non-redundancy: each source-specific representation adds incremental predictive power, with fused models yielding superior accuracy and Sharpe ratios.

Heterogeneous-Agent and Experience-Based Models recognize that investor “knowledge” is fundamentally shaped by belief diversity.

In (Biondi et al., 2011), investor knowledge includes common knowledge of fundamentals (public signals $F_t$ ), private-perceived weights ( $\varphi_i F_t$ ), social mood ( $m_{t,k}^j$ via Galam-type updating), and trend-learning (with revision parameters $\beta_i^j$ ). The clearing price $p_{t+1}^*$ reconciles these via intersection of individual and group-level opinions, capturing bubbles and exuberance as emergent from dynamic coupling between slow-moving information and fast-moving sentiment.
Experience-based learning (EBL) models (Malmendier et al., 2016) encode investor knowledge as cohort-specific recency-weighted averages of observed dividends, formalizing their expectations $\theta_t^n$ . Cross-cohort heterogeneity yields predictable patterns in holdings, return predictability, excess volatility, and trading volume, aligning with empirical observations.

6. Applications and Decision Support

Investor knowledge representations underpin a range of downstream tasks:

Auditable multi-hop research and fact attribution (QuantMind, FinReflectKG).
Real-time news/event-driven signal generation and trend detection (FinDKG, (Li et al., 2024)).
Investor–company matching and explainable recommendation (Kaur et al., 2021).
Portfolio allocation, risk management, and scenario analysis (LBFs (Liu et al., 2012), KG-based ranking (Dukkipati et al., 14 Apr 2025), dynamic stacking (Gao et al., 16 Dec 2025)).
Macro-level modeling of market regimes, bubbles, and crisis anticipation via explicit modal, agentic, and experience-based constructs (Adachi, 2015, Biondi et al., 2011, Malmendier et al., 2016).

In summary, investor knowledge representations have evolved into a rich intersection of graph theory, statistical learning, belief fusion, modal logic, and agent-based economics, enabling expressive, systematic, and actionable modeling of the investor information state for both human and automated decision frameworks.