Financial Knowledge Graph Dataset

Updated 30 August 2025

Financial Knowledge Graph Datasets are structured collections of entities, relationships, and metadata that model complex financial systems for applications like AML and fraud detection.
They integrate diverse data sources including transaction ledgers, corporate filings, and financial news, using advanced extraction and graph neural network techniques.
Benchmarks such as the Elliptic and DGraph datasets demonstrate efficacy in categorizing transactions and evaluating anomaly detection, guiding regulatory and risk assessments.

A financial knowledge graph dataset is a structured collection of entities, relationships, and associated metadata that models complex financial systems, transactions, company hierarchies, risk signals, and market events as graph data with semantic annotations. These datasets underlie critical tasks such as financial forensics, fraud detection, event extraction, risk modeling, similarity quantification, knowledge-driven reasoning, and regulatory compliance. Recent advances have seen financial KGs constructed from large-scale sources including transaction ledgers, corporate disclosures, financial news, and multimodal data, serving as both research benchmarks and production resources for graph-based machine learning.

1. Structural and Semantic Properties

Financial knowledge graph datasets differ in entity types, relationship schemas, temporal characteristics, and granularity. A representative example is the Elliptic Data Set (Weber et al., 2019), which models the Bitcoin payment network for anti-money laundering (AML) research. It consists of 203,769 nodes (Bitcoin transactions), 234,355 directed payment flow edges, and 166 node features. Features include:

Local features (first 94 out of 166): transaction-specific attributes such as time step, number of inputs and outputs, transaction fee, output volume, and aggregated summary statistics.
Aggregated features (remaining 72): neighborhood statistics—min, max, std of local features across one-hop neighbors, both forward (outputs) and backward (inputs).

In this time series graph, every transaction node is associated with one of 49 time steps (biweekly intervals), providing a snapshot of cryptoeconomic activity over time. Approximately 2% of nodes are labeled as illicit and 21% as licit.

Key structural and labeling characteristics of financial KGs can be summarized as follows:

Dataset	Nodes / Entities	Edges / Relations	Temporal Dynamics	Labeled Classes	Example Features
Elliptic Data Set	203,769 (transactions)	234,355 (payment flows)	49 time steps	licit/illicit/unknown	166 (local + aggr.) per node
DGraph	~3.7M (users)	~4.3M (emergency contacts)	time-stamped edges	fraudster/normal/bg	17 features, missing value mask

This diversity in schema enables modeling of transactions (directed multi-relational), company interconnections (heterogeneous, hierarchical), and temporal events (dynamic graphs with time-indexed edges or nodes). Recent work pursues higher-order semantics, context-aware edge weighting, and fine-grained entity typing, as seen in FinDKG (Li et al., 15 Jul 2024), which incorporates 12 meta-entity types and 15 relation types in quadruple (s, r, o, t) fact representations.

2. Construction Methodologies

Construction of financial KGs is driven by the nature of the originating data and downstream analytical objectives.

Transaction-Derived Graphs: For blockchain/cryptocurrency forensics, nodes are instantiated as transactions and edges as money flows (parent → child, by hash pointers), with temporal ordering preserved. Labeled fraud and licit classes rely on expert analysis or tagged criminal investigations.
Document-Derived Graphs: Large-scale filings (e.g., SEC 10-Ks) are parsed using intelligent document parsing layers, table-aware chunking, and schema-constrained LLM-based extraction (Arun et al., 25 Aug 2025). Entity-relation tuples are formed as (Head Entity, Head Type, Relationship, Tail Entity, Tail Type), and extraction correctness is ensured via iterative validation (reflection-driven critic–corrector agent loops).
Event Extraction Augmentation: In regulatory contexts, entity-based Directed Acyclic Graph (EDAG) structures are built for each extracted event, mapping entity roles and interrelations using both document-based embeddings and precomputed graph embeddings from external KGs (Guo et al., 2021).
Hybrid Human–LLM Workflows: Recent frameworks use schema alignment, prompt engineering, and iterative LLM-based extraction with integrated reflection, normalization, and expert-designed evaluation policies (Arun et al., 25 Aug 2025).

Construction complexity is increased by multimodality (e.g., incorporating images, tables, and audio), time-awareness (dynamic edges/entities), and the need for regulatory-compliant schema fidelity.

3. Methods for Learning and Inference

Financial KGs enable the application of various graph and hybrid learning architectures for classification, ranking, and reasoning:

Graph Neural Networks (GNNs): Node classification and anomaly detection tasks use GCNs (with message-passing updates such as $H^{(l+1)} = \sigma(\hat{A} H^{(l)} W^{(l)})$ ) to aggregate neighbor information (Weber et al., 2019). Attention-based GNNs (e.g., KGTransformer) integrate entity meta-type information for improved link prediction in dynamic KGs (Li et al., 15 Jul 2024).
Contrastive and Two-Stage Training: Self-supervised contrastive loss on "tribe-style graphs" (where each firm and its shareholders are a mini-graph) distinguishes risk patterns across corporate clusters (Bi et al., 2023). Robust two-stage methods for label noise, such as KeGCN_R, combine knowledge graph embeddings (TransE) with GCNs, employing neighbor-dependent transition matrices to mitigate "hidden fraud" effects (Wang et al., 26 Feb 2025).
Hybrid ML Pipelines: Ensemble methods such as Random Forests (RF) with tabular and aggregated neighborhood features outperform isolated GCNs for imbalanced illicit transaction detection, yet GCNs provide additional representational power by leveraging network topology (Weber et al., 2019).
Evaluation and Reflection: Extraction quality and KG coverage are evaluated using both rule-based and LLM-as-judge methods, employing compliance scoring ( $CR(t) = (1/R)\sum_{i=1}^R \phi_i(t)$ ) and coverage ratios (Arun et al., 25 Aug 2025).

4. Challenges: Data Imbalance, Noise, and Explainability

Financial KGs exhibit several distinct challenges:

Extreme Class Imbalance: Fraud or illicit activity typically constitutes <2% of labeled data, as in Elliptic and DGraph (Weber et al., 2019, Huang et al., 2022). This necessitates specialized balancing and calibration strategies for rare-event detection.
Information Overload and Noisy Support Nodes: In relational company graphs, meta-nodes (directors, related transactions) can outnumber target company nodes 20–28x, diluting message passing in GNNs. Pretraining embeddings to distill support signals is required (Wang et al., 26 Feb 2025).
Hidden Label Noise: Due to regulatory or investigative lag, fraud labels may reflect cases that are only detected years later, introducing asymmetric label noise; robust two-stage training approaches correct for this using instance- and neighbor-dependent noise transition models (Wang et al., 26 Feb 2025).
Explainability: Given regulatory scrutiny, explainable AI for financial KGs is critical. Visualization tools such as Chronograph (Weber et al., 2019) or event-based graph layouts (e.g., with UMAP for dynamic localization) are employed. Model performance is interpreted both in algorithmic terms (feature aggregation, classification metrics) and visually (cluster tightness, error localization) with attention to the temporal evolution of suspicious activity.

5. Evaluation and Benchmarking Metrics

The benchmarking of KG-based systems in finance revolves around both graph-level and downstream task metrics:

Node Classification: Accuracy, precision, recall, F1-score for (il)licit activity.
AUC and AP: For highly imbalanced anomaly detection (e.g., DGraph, TGN frameworks), AUC is the primary discriminative measure (Kim et al., 27 Mar 2024).
Coverage and Diversity: Semantic diversity (Shannon/Rényi entropy), entity coverage ratio (ECR), and type/relationship coverage are used to assess knowledge graph extraction performance (Arun et al., 25 Aug 2025).
Compliance Scoring: Rule-based policy adherence for triple correctness, entity normalization, and schema fidelity (CheckRules, reflection compliance).
Expert Judgment: LLM-as-a-Judge and human-expert labeling augment quantitative metrics by assessing comprehensiveness, faithfulness, and relevance.
Visualization-based Assessment: Tools that enable dynamic exploration and error tracing provide qualitative signals for both model debugging and regulatory reporting.

A summary table organizes representative evaluation metrics:

Metric	Domain	Application	Example Paper
F1-score	Node Classification	Illicit transaction detection	(Weber et al., 2019)
AUC	Anomaly Detection	Fraudster/Normal discrimination	(Huang et al., 2022, Kim et al., 27 Mar 2024)
ECR/TCR	KG Extraction	Entity/relation coverage	(Arun et al., 25 Aug 2025)
Entropy	Semantic Diversity	Richness of extracted concepts	(Arun et al., 25 Aug 2025)
LLM-Judge	Extraction Quality	Triple-level accuracy and faithfulness	(Arun et al., 25 Aug 2025)

6. Broader Implications and Societal Impact

Financial KGs play a dual role in safeguarding systemic integrity while enabling broader financial inclusion:

AML and Forensics: Public KGs such as Elliptic facilitate forensic analysis, reducing false positive rates, and supporting more nuanced anti–money laundering operations (Weber et al., 2019).
Risk Management and Regulatory Compliance: By explicitly capturing complex relationships—corporate ownership, related-party transactions, cascading risk signals across the supply chain—financial KGs offer a foundation for early warning systems, compliance reporting, and targeted interventions (Bi et al., 2023, Wang et al., 26 Feb 2025).
Fairness and Financial Inclusion: Reducing overreach (e.g., unnecessary investigation of licit parties) through more accurate detection and explainable models can counteract the exclusionary effects of stringent compliance regimes.
Research and Benchmarking: Large, open-source datasets (as in FinReflectKG (Arun et al., 25 Aug 2025)) catalyze further methodological advances, furnishing reproducible evaluation platforms and supporting the development of graph-based, interpretable financial AI systems.

A plausible implication is that as the regulatory landscape evolves, dynamic and semantically-enriched KGs—capable of integrating new entity types, relations, and event-driven updates—will become essential not just for technical performance, but for satisfying transparency, auditability, and fairness requirements in financial technology.

7. Future Directions

The evolution of financial KGs is trending toward:

Higher-Order and Multimodal Graphs: Integration of tables, free text, news, and images; entity typing at granular levels; cross-document and cross-modal entity linking.
Agentic, Reflection-Driven Extraction: Multi-pass, feedback-driven information extraction loops with both rule-based and LLM-based critique and correction, aiming for continued improvements in coverage, accuracy, and compliance (Arun et al., 25 Aug 2025).
Dynamic Thematic Graphs: Temporal graphs capable of supporting thematic investing, trend detection, and real-time scenario analysis (e.g., via time-evolving quadruples and meta-entity–aware attention models) (Li et al., 15 Jul 2024).
Hybrid Modeling: Fusion of classical ensemble models (for tabular, explainable outputs) with graph-based representations to combine precision, interpretability, and relational learning (Weber et al., 2019).
Robustness Against Data Issues: Advanced approaches for label noise correction, handling of missing values, and explicit incorporation of uncertain or unlabeled background nodes to maintain graph connectivity and semantic integrity (Wang et al., 26 Feb 2025, Huang et al., 2022).

Financial knowledge graph datasets, grounded in large-scale, multi-modal, and multi-relational data, now underpin critical advances in financial machine learning, regulatory compliance, and the broader mission of transparent, reliable financial analytics.