FinLake-Bench: Financial Data Lake Benchmark
- FinLake-Bench is a financial table-centric benchmark that formalizes unionability, joinability, and subset detection through precise mathematical definitions.
- It standardizes preprocessing protocols and ground-truth label generation to ensure consistent evaluation of financial data integration across diverse sources.
- The benchmark enables fine-tuning of tabular LLMs using robust metrics like precision, recall, and F₁-score, driving improvements in financial data management.
FinLake-Bench is a financial table-centric benchmark for evaluating data discovery capabilities over financial data lakes, directly extending the modular methodology established by LakeBench. Its focus lies in isolating and formalizing the core tasks of unionability, joinability, and subset detection on financial datasets, such as time-series exports from the European Central Bank (ECB) and related public finance data sources. FinLake-Bench provides granular task definitions, detailed preprocessing protocols, robust ground-truth generation, and standardized evaluation metrics, thereby furnishing a replicable framework for benchmarking tabular foundational models and LLM systems in financial data management and integration.
1. Task Definitions and Formalization
FinLake-Bench formalizes three central data-discovery tasks for financial table pairs using precise mathematical notations:
- Unionability: Two tables and are unionable if there exists a one-to-one column mapping such that for every , the columns are type-compatible according to a predicate . Formally:
- Joinability: Defined via sufficient overlap in join key value distributions. With key sets , , label as joinable if:
- Subset Detection: Table is a subset of if and each record in is present in the projection of onto :
These formalizations ensure precise, rigorous definitions crucial for evaluating automated table pairing and integration in financial data lakes (Srinivas et al., 2023).
2. Data Selection and Preprocessing Protocols
FinLake-Bench sources financial tables from ECB SDMX CSV feeds, including interest-rate, balance-sheet, and exchange-rate time series. These are augmented with SEC/XBRL filings, commercial banking CSVs, and market-data snapshots. To construct a diverse benchmark, table pairs are stratified into “low-overlap” and “high-overlap” groups by categorical domain (e.g., yield curves, random cross-category). Preprocessing involves:
- Normalizing date formats to ISO-8601.
- Standardizing currency codes to ISO-4217.
- Casting all numeric columns to uniform precision.
- Flattening multi-level headers and removing empty/singleton columns.
This preprocessing ensures consistency across financial tables, reducing spurious correlations and supporting reliable downstream evaluation (Srinivas et al., 2023).
3. Ground-Truth Label Generation
FinLake-Bench adapts LakeBench's combination of automatic heuristics and manual validation:
- Subset labeling uses column inclusion and record overlap thresholds (algorithmic form shown in LaTeX pseudocode).
- Joinability evaluation samples all column-pair overlaps and labels positive if any Jaccard exceeds and negative if all are below ; ambiguous pairs are discarded.
- Unionability leverages ECB "concept" identifiers; two tables with shared concept receive positive labels, while cross-concept pairs are negatives, supplemented with manual review for edge cases.
All labeled pairs are split into train/validation/test according to an 80/10/10 ratio, with strict non-overlap of tables between splits, eliminating information leakage (Srinivas et al., 2023).
4. Evaluation Metrics and Performance Characterization
FinLake-Bench employs standard binary classification metrics for each task:
- Precision:
- Recall:
- F₁-score:
- Accuracy:
For regression-oriented tasks (e.g., ecb-union), FinLake-Bench tracks Mean Squared Error (MSE) and Mean Absolute Error (MAE):
These metrics provide robust quantification of both categorical and continuous evaluation scenarios (Srinivas et al., 2023).
5. Model Training and Experimental Setup
Tabular LLMs and specialized models (e.g., TaPEx, Kipoi-Tab, table-fine-tuned Flan-T5) are fine-tuned on the FinLake-Bench train split, either jointly with a multi-task head or individually per task. Key hyperparameters include a learning rate of with linear decay over 5 epochs, batch size of 16, AdamW optimizer (weight decay 0.01), and early stopping on validation F₁ with two-epoch patience. Computation is performed on a single NVIDIA A100 GPU in mixed-precision mode. Financial modeling nuances are addressed by pretraining numeric-aware adapters and including currency normalization prompts in the model context window (Srinivas et al., 2023).
6. Empirical Observations and Domain-Specific Adjustments
LakeBench’s findings reveal that tabular LLMs rely heavily on column-name cues, encountering substantial degradation when columns are obfuscated. Numeric and date columns challenge zero-shot LLMs, with overlap heuristics outperforming naive prompting for joinability, and subset detection performance decreases with record cardinality imbalance and rounding fluctuations.
Recommendations for FinLake-Bench include:
- Embedding a “currency conversion” subtask for unionability, harmonizing columns with heterogeneous denominations.
- Tightening subset overlap thresholds to accommodate round-off issues in time-series.
- Allowing multi-key joins and labeling accordingly (e.g., date + country code).
- Enriching model prompts with explicit financial-domain instructions and quantifying their benefit over generic prompts.
- Considering mixed-precision numeric encoders to more accurately represent distributional differences in large time-series (Srinivas et al., 2023).
7. Benchmark Positioning and Research Implications
FinLake-Bench’s scope is distinct from benchmarks such as XFinBench, which primarily assess LLM reasoning over complex financial problems, multimodal integration, and knowledge retrieval (Zhang et al., 20 Aug 2025). XFinBench recommendations for FinLake-Bench include diversifying financial topics (e.g., risk management, fixed-income modeling), introducing mixed-difficulty tiers, integrating multimodal inputs (e.g., news text + numerical data), augmenting metric selection, and deploying retrieval-augmented generation pipelines. A plausible implication is that FinLake-Bench can bridge evaluation gaps across both table-centric discovery and broader financial modeling, catalyzing improvements in financial-data LLM architectures and multimodal data integration.
FinLake-Bench, when implemented according to LakeBench’s modular, transparent methodology, offers a robust, replicable, and extensible resource for the financial data science community focused on discovering, connecting, and integrating financial datasets at scale.