Papers
Topics
Authors
Recent
Search
2000 character limit reached

FLANG-BERT/FLANG-ELECTRA: Financial NLP Models

Updated 6 May 2026
  • FLANG-BERT and FLANG-ELECTRA are domain-adapted language models augmented with financial vocabulary and customized masking protocols.
  • They employ a two-stage masking approach, including targeted phrase masking and a span boundary objective, to better capture financial semantics.
  • Empirical evaluations on the FLUE benchmark demonstrate superior performance across financial sentiment, NER, and QA tasks compared to baseline models.

FLANG-BERT and FLANG-ELECTRA are domain-adapted large pre-trained LLMs for the financial sector, extending the BERT-base and ELECTRA-base architectures respectively. Both incorporate explicit financial vocabulary expansion, finance-aware masking strategies, and additional objectives designed to better leverage the structural and semantic properties of financial texts. Their introduction coincides with the Financial Language Understanding Evaluation (FLUE) benchmark suite, which provides comprehensive multi-task evaluation for the financial domain. FLANG-BERT and FLANG-ELECTRA have demonstrated superior performance over prior approaches across a broad array of financial NLP tasks (Shah et al., 2022).

1. Architecture and Model Modifications

FLANG-BERT

FLANG-BERT adopts the BERT-base configuration, comprising 12 Transformer encoder layers, 12 self-attention heads, a hidden size of 768, and approximately 110 million parameters. The major architectural adaptation is the vocabulary expansion: the standard WordPiece vocabulary is augmented by ~8.2k words and phrases targeting financial discourse. Pre-training introduces a two-stage masking procedure—first targeting single-word tokens and then contiguous financial phrases.

FLANG-ELECTRA

FLANG-ELECTRA builds on ELECTRA-base, with distinct architectures for the generator and discriminator. The generator utilizes 12 layers (hidden size 256, 4 attention heads, ~14 million parameters), while the discriminator matches ELECTRA-base (12 layers, hidden size 768, 12 heads, ~110 million parameters). Modifications include:

  • Preferential masking—masking of tokens/phrases selected for financial salience.
  • Addition of a Span Boundary Objective to the generator (details below).
  • Adoption of the two-stage masking protocol paralleling FLANG-BERT.

2. Pre-training Objectives and Loss Formulations

Both models integrate three core loss components, detailed below using the original mathematical formulations:

2.1 Financial Keyword and Phrase Masking

Let NN denote the sequence length, DfD_f denote indices ii where token wiw_i is in a financial-term dictionary, and MM denote positions selected for masking. The procedure is:

  • Mask 15% of tokens per sequence.
  • 30% of masked positions MfM_f are reserved for financial terms (Mf=0.15N×0.3|M_f| = 0.15 N \times 0.3).
  • 70% are drawn at random from non-financial terms (Mr=0.15N×0.7|M_r| = 0.15 N \times 0.7).

Probability a token wiw_i is masked:

P(iM)={0.15N×0.3Df,iDf 0.15N×0.7NDf,iDfP(i\in M) = \begin{cases} \tfrac{0.15\,N\times0.3}{|D_f|}, & i\in D_f \ \tfrac{0.15\,N\times0.7}{N - |D_f|}, & i\notin D_f \end{cases}

During the phrase-masking stage, each multi-token financial phrase is masked as a contiguous span (single [MASK] token) with probability 0.3.

2.2 Masked Language Modeling (Infilling) Loss

The model predicts masked tokens from their context:

DfD_f0

2.3 Span Boundary Objective (SBO)

Specific to FLANG-ELECTRA, the generator reconstructs each masked token DfD_f1 using only the boundary tokens of its masked span. For a span DfD_f2, compute

DfD_f3

where

DfD_f4

The span boundary loss: DfD_f5

2.4 Discriminator Loss (ELECTRA)

The discriminator distinguishes real tokens from generator replacements:

DfD_f6

2.5 Total Loss

The overall loss combines these components:

DfD_f7

This composite objective explicitly incorporates domain relevance, infilling, and span boundary prediction (Shah et al., 2022).

3. Datasets and Pre-training Protocol

FLANG models are pre-trained using a mixture of general English and financial-domain corpora.

Pre-training Data Sources

Dataset # Documents Years % Sampled/Epoch
BooksCorpus
Wikipedia
SEC 10-K filings 13,660 1993–2020 8 %
SEC 10-Q filings 36,402 1993–2020 5 %
Earnings call transcripts 151,359 2007–2019 1.5 %
Reuters financial news 106,521 2007 10 %
Bloomberg financial news 387,220 2009 5 %
Analyst reports (LexisNexis) 201 2017–2020 100 %
Investopedia concept pages 638 N/A 100 %
  • BooksCorpus (800M words) and Wikipedia (2.5B words) represent general English.
  • Financial corpora are subsampled per epoch as indicated.

Pre-training Regimen

  • Models initialized from Huggingface BERT-base or ELECTRA-base.
  • Four epochs of pre-training on the combined data.
    • Epochs 1–2: single-token (word) financial masking.
    • Epochs 3–4: token + multi-token phrase masking.
  • Masking rate: 15%.
  • Adam optimizer; learning rate warm-up as in standard Transformer pre-training.
  • Batch size and specific learning rate schedule not fixed.

4. Financial Language Understanding Evaluation (FLUE) and Empirical Results

FLUE is an open-source benchmark suite covering five core financial NLP tasks:

  1. Financial Sentiment Analysis (Financial PhraseBank, FiQA 2018 SA)
  2. News Headline Classification (Gold Headlines dataset)
  3. Named Entity Recognition (finance-domain NER set)
  4. Structure Boundary Detection (FinSBD3)
  5. Financial Question Answering (FiQA 2018 QA)

Summary Results (averaged over 3 seeds)

Model FPB (Acc) FiQA SA (MSE) Headline (F₁) NER (F₁) SBD (F₁) FiQA QA (nDCG)
BERT-base 85.6% 0.073 0.967 0.79 0.95 0.46
FinBERT 87.2% 0.070 0.968 0.80 0.89 0.42
FLANG-BERT 91.2% 0.054 0.972 0.83 0.96 0.51
ELECTRA-base 88.1% 0.066 0.966 0.78 0.94 0.52
FLANG-ELECTRA 91.9% 0.034 0.980 0.82 0.97 0.55

Across every task, FLANG-BERT and FLANG-ELECTRA deliver state-of-the-art metrics compared to non-domain-adapted and prior financial-domain baselines. Detailed ablation results are available in the paper's supplementary tables.

5. Deployment and Usage Recommendations

  • Model access: FLANG-BERT and FLANG-ELECTRA are available for download on Huggingface. Complete code, data, and model configurations are hosted at https://salt-nlp.github.io/FLANG/.
  • Fine-tuning: For classification tasks, it is recommended to use a combined loss—cross-entropy plus Supervised Contrastive Loss:

DfD_f8

Refer to Section –A.3 of (Shah et al., 2022) for the exact formula.

  • Usual hyperparameters: 2–5 epochs, learning rate DfD_f9–ii0, batch size 16–32, weight decay 0.01.
  • For regression and QA tasks, standard task heads from the Transformers library may be used (MSE loss for regression, bi-encoder or cross-encoder for QA).

This suggests that FLANG-BERT and FLANG-ELECTRA can be integrated into existing NLP pipelines for financial tasks with minimal adaptation, while yielding substantial gains where financial vocabulary and structure are critical.

6. Distinctive Features and Performance Analysis

Distinctive aspects of FLANG-BERT and FLANG-ELECTRA include:

  • Systematic augmentation of vocabulary for financial precision (~8.2k terms/phrases).
  • Layered masking regime emphasizing domain relevancy, including span-based and phrase-level masking.
  • Integration of the span boundary objective, especially in FLANG-ELECTRA, to leverage local context for masked span reconstruction.
  • Empirical validation demonstrating consistent outperformance over BERT-base, ELECTRA-base, and domain-adapted baselines (FinBERT), especially on complex multi-sentence financial tasks and open-domain QA.
  • The FLUE benchmark suite is designed to be more comprehensive and challenging than previously available financial NLP benchmarks.

A plausible implication is that targeted adaptation through vocabulary design and masking objectives is critical for extending pre-trained LLMs to specialized technical domains such as finance. These adaptations achieve superior empirical results while preserving compatibility with standard Transformer and ELECTRA frameworks (Shah et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FLANG-BERT/FLANG-ELECTRA.