LOBFrame: Python Toolkit for LOB Forecasting

Updated 19 December 2025

LOBFrame is an open-source, modular Python framework that processes high-frequency Limit Order Book data for mid-price forecasting.
It integrates end-to-end modules for data ingestion, preprocessing, balanced batching, and deep learning models like DeepLOB.
The framework provides rigorous evaluation using both traditional metrics and a novel transaction-level practicability simulator.

LOBFrame is an open-source, modular Python codebase designed to support efficient processing, feature engineering, and deep learning-based analysis of large-scale high-frequency Limit Order Book (LOB) data, particularly for mid-price change forecasting on NASDAQ equities. Developed within the context of contemporary research into high-frequency financial market microstructure, LOBFrame emphasizes both algorithmic rigor and operational assessment of forecasting strategies, providing traditional machine learning performance metrics alongside a novel transaction-level practicability evaluation. The framework is built atop PyTorch and targets end-to-end workflows for ingestion, transformation, balanced batching, model training, validation/testing, and quantitative performance assessment (Briola et al., 2024).

1. System Architecture and Data Flow

LOBFrame comprises a set of interlocking modules, each responsible for a distinct part of the LOB processing and prediction lifecycle. The main modules include:

lobframe.ingest: Handles reading of “message” and “orderbook” files, including timestamp merging, filtering of auctions and crossed quotes, and day-range clipping (default: 9:40–15:50 ET).
lobframe.preprocess: Processes data events at nanosecond granularity, collapses duplicate timestamps, constructs compressed book snapshots to depth $L$ , and applies rolling window z-score normalization (5-day feature-wise).
lobframe.dataset: Implements a PyTorch Dataset abstraction, storing sequences of $T$ consecutive LOB snapshots and generating mid-price change labels for multiple forecast horizons ( $\Delta\tau \in \{10,50,100\}$ ).
lobframe.dataloader: Provides custom PyTorch DataLoader logic with balanced mini-batch sampling (Down, Stable, Up classes) for training, and sequential, unbalanced sampling for validation and test.
lobframe.model: Wraps a portfolio of state-of-the-art architectures for LOB forecasting (e.g., DeepLOB: CNN → LSTM → FC), exposing a standardized interface for forward/inference, loss, and prediction routines.
lobframe.trainer: Orchestrates the high-performance training loop, supports early stopping (patience = 15), logs epoch-level metrics, and manages optimizer configuration (AdamW with specified hyperparameters).
lobframe.evaluator: Computes confusion matrices, Matthews correlation coefficient (MCC), F1, accuracy as functions of probability threshold, and executes a transaction-oriented practicability simulation ( $p_T$ ).
lobframe.utils: Supplies utilities for plotting, reproducibility (seed), checkpointing, GPU/multi-node detection, and high-performance cluster execution.

Internal data representations include the LOBRecord ( $\tau$ ), a $4L$-vector encoding prices and volumes on both sides to depth $L$ , and the SequenceSample, composed of an input tensor $X \in \mathbb{R}^{T \times 4L}$ and a label $y \in \{-1,0,1\}$ determined by mid-price change at $\tau+\Delta\tau$ .

2. Data Ingestion, Preprocessing, and Label Construction

The ingestion pipeline processes raw LOBSTER-format message and orderbook files, filters out auction activity and time ranges outside standard hours, removes crossed quotes, and collapses duplicate timestamps. Book snapshots are constructed as

$\mathbb{L}(\tau) = \left\{\,p^{\text{ask}}_\ell(\tau), v^{\text{ask}}_\ell(\tau), p^{\text{bid}}_\ell(\tau), v^{\text{bid}}_\ell(\tau)\right\}_{\ell=1}^L,$

where %%%%10%%%% defaults to 10. Rolling z-score normalization is applied, with mean and standard deviation computed over a 5-day window for each feature: $\tilde x_{t,f} = \frac{ x_{t,f} - \mu_{t,f} }{ \sigma_{t,f} }.$ For supervised learning, input sequences comprise $T=100$ consecutive normalized LOB records and a target label reflecting the movement of the mid-price over horizon $\Delta\tau$ : $m_\tau = \frac{p^{\text{ask}}_1(\tau) + p^{\text{bid}}_1(\tau)}{2},$ with the target assigned via thresholding as

${\rm label}(\tau,\Delta) = \begin{cases} -1 & m_{\tau+\Delta} - m_\tau \leq -\theta\ 0 & |m_{\tau+\Delta} - m_\tau| < \theta\ +1 & m_{\tau+\Delta} - m_\tau \geq +\theta \end{cases}$

where $\theta$ is a tunable parameter.

Balanced sampling ensures that during training, each mini-batch is uniformly stratified across label classes, capped at 5000 samples/day/class.

3. Modeling, Training, and Configuration

LOBFrame supports rapid specification and instantiation of state-of-the-art deep learning models, notably DeepLOB (convolutional layers followed by LSTM and fully-connected classifier). Models accept a standardized input tensor of shape $[N, T, 4L]$ and emit logits over three target classes. The object-oriented design exposes key methods:

DeepLOBModel.forward(x: Tensor[N, T, 4L]) → logits: Tensor[N, 3]
Trainer.fit(), Trainer.test()
Confusion matrix and metric plotting utilities.

Training employs AdamW optimization with default hyperparameters (learning rate $6 \times 10^{-5}$ , betas $(0.9, 0.95)$ , weight decay $\approx 10^{-2}$ ), up to 100 epochs with early stopping on validation loss plateau (patience = 15).

Best-practice configuration and scaling include batched multi-worker data loading ( $\geq 8$ workers), optional parallelization via SLURM or Ray, storage of preprocessed data in HDF5 or Parquet, use of mixed-precision (AMP) for efficient GPU utilization, and checkpointing/gradient accumulation as required by system constraints.

4. Evaluation: Traditional Metrics and Practicability Simulator

LOBFrame provides both conventional machine learning diagnostics and a transaction-centric practicability measure. After completion of training and testing, users may invoke:

trainer.plot_confusion_matrix(mode='H10')
trainer.plot_mcc_curve(horizon=50)
trainer.plot_f1_curve()
trainer.plot_accuracy_curve()

Standard reporting encompasses confusion matrices, averaged by tick class, and MCC/F1/accuracy curves versus decision threshold.

The practicability simulator supplements these metrics by quantifying the probability $p_T$ of correctly forecasting "complete transactions" (i.e., true open–close transitions), defined as: $p_T = \frac{ |CT| }{ |PT| + |TT| - |CT| },$ where $PT$ is the set of true transaction transitions, $TT$ is the set predicted by the model, and $CT = TT \cap PT$ . Evaluator methods include:

sim = StrategySimulator(preds, targets, prob_threshold=0.5, horizon=50)
p_T = sim.probability_correct_transaction()
sim.plot_pT_vs_threshold()

This operational metric reflects the model's true utility for trading applications, addressing regimes where statistical accuracy may not translate into actionable signals (Briola et al., 2024).

5. Usage Patterns, Extensibility, and Deployment

To prepare and analyze a new LOB dataset using LOBFrame, practitioners export message/book CSVs in LOBSTER format, specify location and preprocessing choices in a configuration file, and invoke the main pipeline:

git clone https://github.com/FinancialComputingUCL/LOBFrame.git
cd LOBFrame
pip install -r requirements.txt
python run.py --config my_config.yaml --model DeepLOB

Typical defaults are

L=10

T=100

, balanced sampling for training, rolling z-score normalization, and multiple forecast horizons (

\Delta\tau \in [10, 50, 100]

). Extensibility to new datasets, custom horizons, or alternative model architectures is facilitated by the modular codebase and standardized interface for sequence batching, model integration, and evaluation.

Scaling is supported via efficient serialization, multi-threaded loading, and cluster execution frameworks. This enables deployment on large-scale NASDAQ datasets, supporting research into deep learning models for order book forecasting with both statistical and operational assessment.

6. Context, Significance, and Research Implications

LOBFrame is released in conjunction with a comprehensive empirical study that reveals significant heterogeneity in deep learning forecasting performance across stocks with different microstructural properties. The results underscore two key points: (1) The predictive efficacy of deep neural architectures is not uniform across LOB regimes; (2) Classical accuracy metrics do not suffice for gauging practical utility in trading contexts. The introduction of the practicability simulator addresses this evaluation gap, providing a methodology for transaction-level assessment of model predictions. LOBFrame thus establishes a robust, reproducible platform for future research, enabling academics and practitioners to systematically benchmark, deploy, and understand the limitations of deep learning in LOB prediction tasks (Briola et al., 2024).

PDF Markdown Chat (Pro)

References (1)

Deep Limit Order Book Forecasting (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to LOBFrame.