FLANG-BERT/FLANG-ELECTRA: Financial NLP Models

Updated 6 May 2026

FLANG-BERT and FLANG-ELECTRA are domain-adapted language models augmented with financial vocabulary and customized masking protocols.
They employ a two-stage masking approach, including targeted phrase masking and a span boundary objective, to better capture financial semantics.
Empirical evaluations on the FLUE benchmark demonstrate superior performance across financial sentiment, NER, and QA tasks compared to baseline models.

FLANG-BERT and FLANG-ELECTRA are domain-adapted large pre-trained LLMs for the financial sector, extending the BERT-base and ELECTRA-base architectures respectively. Both incorporate explicit financial vocabulary expansion, finance-aware masking strategies, and additional objectives designed to better leverage the structural and semantic properties of financial texts. Their introduction coincides with the Financial Language Understanding Evaluation (FLUE) benchmark suite, which provides comprehensive multi-task evaluation for the financial domain. FLANG-BERT and FLANG-ELECTRA have demonstrated superior performance over prior approaches across a broad array of financial NLP tasks (Shah et al., 2022).

1. Architecture and Model Modifications

FLANG-BERT

FLANG-BERT adopts the BERT-base configuration, comprising 12 Transformer encoder layers, 12 self-attention heads, a hidden size of 768, and approximately 110 million parameters. The major architectural adaptation is the vocabulary expansion: the standard WordPiece vocabulary is augmented by ~8.2k words and phrases targeting financial discourse. Pre-training introduces a two-stage masking procedure—first targeting single-word tokens and then contiguous financial phrases.

FLANG-ELECTRA

FLANG-ELECTRA builds on ELECTRA-base, with distinct architectures for the generator and discriminator. The generator utilizes 12 layers (hidden size 256, 4 attention heads, ~14 million parameters), while the discriminator matches ELECTRA-base (12 layers, hidden size 768, 12 heads, ~110 million parameters). Modifications include:

Preferential masking—masking of tokens/phrases selected for financial salience.
Addition of a Span Boundary Objective to the generator (details below).
Adoption of the two-stage masking protocol paralleling FLANG-BERT.

2. Pre-training Objectives and Loss Formulations

Both models integrate three core loss components, detailed below using the original mathematical formulations:

2.1 Financial Keyword and Phrase Masking

Let $N$ denote the sequence length, $D_f$ denote indices $i$ where token $w_i$ is in a financial-term dictionary, and $M$ denote positions selected for masking. The procedure is:

Mask 15% of tokens per sequence.
30% of masked positions $M_f$ are reserved for financial terms ( $|M_f| = 0.15 N \times 0.3$ ).
70% are drawn at random from non-financial terms ( $|M_r| = 0.15 N \times 0.7$ ).

Probability a token $w_i$ is masked:

$P(i\in M) = \begin{cases} \tfrac{0.15\,N\times0.3}{|D_f|}, & i\in D_f \ \tfrac{0.15\,N\times0.7}{N - |D_f|}, & i\notin D_f \end{cases}$

During the phrase-masking stage, each multi-token financial phrase is masked as a contiguous span (single [MASK] token) with probability 0.3.

2.2 Masked Language Modeling (Infilling) Loss

The model predicts masked tokens from their context:

$D_f$ 0

2.3 Span Boundary Objective (SBO)

Specific to FLANG-ELECTRA, the generator reconstructs each masked token $D_f$ 1 using only the boundary tokens of its masked span. For a span $D_f$ 2, compute

$D_f$ 3

where

$D_f$ 4

The span boundary loss: $D_f$ 5

2.4 Discriminator Loss (ELECTRA)

The discriminator distinguishes real tokens from generator replacements:

$D_f$ 6

2.5 Total Loss

The overall loss combines these components:

$D_f$ 7

This composite objective explicitly incorporates domain relevance, infilling, and span boundary prediction (Shah et al., 2022).

3. Datasets and Pre-training Protocol

FLANG models are pre-trained using a mixture of general English and financial-domain corpora.

Pre-training Data Sources

Dataset	# Documents	Years	% Sampled/Epoch
BooksCorpus	—	—	—
Wikipedia	—	—	—
SEC 10-K filings	13,660	1993–2020	8 %
SEC 10-Q filings	36,402	1993–2020	5 %
Earnings call transcripts	151,359	2007–2019	1.5 %
Reuters financial news	106,521	2007	10 %
Bloomberg financial news	387,220	2009	5 %
Analyst reports (LexisNexis)	201	2017–2020	100 %
Investopedia concept pages	638	N/A	100 %

BooksCorpus (800M words) and Wikipedia (2.5B words) represent general English.
Financial corpora are subsampled per epoch as indicated.

Pre-training Regimen

Models initialized from Huggingface BERT-base or ELECTRA-base.
Four epochs of pre-training on the combined data.
- Epochs 1–2: single-token (word) financial masking.
- Epochs 3–4: token + multi-token phrase masking.
Masking rate: 15%.
Adam optimizer; learning rate warm-up as in standard Transformer pre-training.
Batch size and specific learning rate schedule not fixed.

4. Financial Language Understanding Evaluation (FLUE) and Empirical Results

FLUE is an open-source benchmark suite covering five core financial NLP tasks:

Financial Sentiment Analysis (Financial PhraseBank, FiQA 2018 SA)
News Headline Classification (Gold Headlines dataset)
Named Entity Recognition (finance-domain NER set)
Structure Boundary Detection (FinSBD3)
Financial Question Answering (FiQA 2018 QA)

Summary Results (averaged over 3 seeds)

Model	FPB (Acc)	FiQA SA (MSE)	Headline (F₁)	NER (F₁)	SBD (F₁)	FiQA QA (nDCG)
BERT-base	85.6%	0.073	0.967	0.79	0.95	0.46
FinBERT	87.2%	0.070	0.968	0.80	0.89	0.42
FLANG-BERT	91.2%	0.054	0.972	0.83	0.96	0.51
ELECTRA-base	88.1%	0.066	0.966	0.78	0.94	0.52
FLANG-ELECTRA	91.9%	0.034	0.980	0.82	0.97	0.55

Across every task, FLANG-BERT and FLANG-ELECTRA deliver state-of-the-art metrics compared to non-domain-adapted and prior financial-domain baselines. Detailed ablation results are available in the paper's supplementary tables.

5. Deployment and Usage Recommendations

Model access: FLANG-BERT and FLANG-ELECTRA are available for download on Huggingface. Complete code, data, and model configurations are hosted at https://salt-nlp.github.io/FLANG/.
Fine-tuning: For classification tasks, it is recommended to use a combined loss—cross-entropy plus Supervised Contrastive Loss:

$D_f$ 8

Refer to Section –A.3 of (Shah et al., 2022) for the exact formula.

Usual hyperparameters: 2–5 epochs, learning rate $D_f$ 9– $i$ 0, batch size 16–32, weight decay 0.01.
For regression and QA tasks, standard task heads from the Transformers library may be used (MSE loss for regression, bi-encoder or cross-encoder for QA).

This suggests that FLANG-BERT and FLANG-ELECTRA can be integrated into existing NLP pipelines for financial tasks with minimal adaptation, while yielding substantial gains where financial vocabulary and structure are critical.

6. Distinctive Features and Performance Analysis

Distinctive aspects of FLANG-BERT and FLANG-ELECTRA include:

Systematic augmentation of vocabulary for financial precision (~8.2k terms/phrases).
Layered masking regime emphasizing domain relevancy, including span-based and phrase-level masking.
Integration of the span boundary objective, especially in FLANG-ELECTRA, to leverage local context for masked span reconstruction.
Empirical validation demonstrating consistent outperformance over BERT-base, ELECTRA-base, and domain-adapted baselines (FinBERT), especially on complex multi-sentence financial tasks and open-domain QA.
The FLUE benchmark suite is designed to be more comprehensive and challenging than previously available financial NLP benchmarks.

A plausible implication is that targeted adaptation through vocabulary design and masking objectives is critical for extending pre-trained LLMs to specialized technical domains such as finance. These adaptations achieve superior empirical results while preserving compatibility with standard Transformer and ELECTRA frameworks (Shah et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FLANG-BERT/FLANG-ELECTRA.

FLANG-BERT/FLANG-ELECTRA: Financial NLP Models

1. Architecture and Model Modifications

FLANG-BERT

FLANG-ELECTRA

2. Pre-training Objectives and Loss Formulations

2.1 Financial Keyword and Phrase Masking

2.2 Masked Language Modeling (Infilling) Loss

2.3 Span Boundary Objective (SBO)

2.4 Discriminator Loss (ELECTRA)

2.5 Total Loss

3. Datasets and Pre-training Protocol

Pre-training Data Sources

Pre-training Regimen

4. Financial Language Understanding Evaluation (FLUE) and Empirical Results

Summary Results (averaged over 3 seeds)

5. Deployment and Usage Recommendations

6. Distinctive Features and Performance Analysis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FLANG-BERT/FLANG-ELECTRA: Financial NLP Models

1. Architecture and Model Modifications

FLANG-BERT

FLANG-ELECTRA

2. Pre-training Objectives and Loss Formulations

2.1 Financial Keyword and Phrase Masking

2.2 Masked Language Modeling (Infilling) Loss

2.3 Span Boundary Objective (SBO)

2.4 Discriminator Loss (ELECTRA)

2.5 Total Loss

3. Datasets and Pre-training Protocol

Pre-training Data Sources

Pre-training Regimen

4. Financial Language Understanding Evaluation (FLUE) and Empirical Results

Summary Results (averaged over 3 seeds)

5. Deployment and Usage Recommendations

6. Distinctive Features and Performance Analysis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research