LOBFrame: Python Toolkit for LOB Forecasting
- LOBFrame is an open-source, modular Python framework that processes high-frequency Limit Order Book data for mid-price forecasting.
- It integrates end-to-end modules for data ingestion, preprocessing, balanced batching, and deep learning models like DeepLOB.
- The framework provides rigorous evaluation using both traditional metrics and a novel transaction-level practicability simulator.
LOBFrame is an open-source, modular Python codebase designed to support efficient processing, feature engineering, and deep learning-based analysis of large-scale high-frequency Limit Order Book (LOB) data, particularly for mid-price change forecasting on NASDAQ equities. Developed within the context of contemporary research into high-frequency financial market microstructure, LOBFrame emphasizes both algorithmic rigor and operational assessment of forecasting strategies, providing traditional machine learning performance metrics alongside a novel transaction-level practicability evaluation. The framework is built atop PyTorch and targets end-to-end workflows for ingestion, transformation, balanced batching, model training, validation/testing, and quantitative performance assessment (Briola et al., 2024).
1. System Architecture and Data Flow
LOBFrame comprises a set of interlocking modules, each responsible for a distinct part of the LOB processing and prediction lifecycle. The main modules include:
- lobframe.ingest: Handles reading of “message” and “orderbook” files, including timestamp merging, filtering of auctions and crossed quotes, and day-range clipping (default: 9:40–15:50 ET).
- lobframe.preprocess: Processes data events at nanosecond granularity, collapses duplicate timestamps, constructs compressed book snapshots to depth , and applies rolling window z-score normalization (5-day feature-wise).
- lobframe.dataset: Implements a PyTorch Dataset abstraction, storing sequences of consecutive LOB snapshots and generating mid-price change labels for multiple forecast horizons ().
- lobframe.dataloader: Provides custom PyTorch DataLoader logic with balanced mini-batch sampling (Down, Stable, Up classes) for training, and sequential, unbalanced sampling for validation and test.
- lobframe.model: Wraps a portfolio of state-of-the-art architectures for LOB forecasting (e.g., DeepLOB: CNN → LSTM → FC), exposing a standardized interface for forward/inference, loss, and prediction routines.
- lobframe.trainer: Orchestrates the high-performance training loop, supports early stopping (patience = 15), logs epoch-level metrics, and manages optimizer configuration (AdamW with specified hyperparameters).
- lobframe.evaluator: Computes confusion matrices, Matthews correlation coefficient (MCC), F1, accuracy as functions of probability threshold, and executes a transaction-oriented practicability simulation ().
- lobframe.utils: Supplies utilities for plotting, reproducibility (seed), checkpointing, GPU/multi-node detection, and high-performance cluster execution.
Internal data representations include the LOBRecord (), a $4L$-vector encoding prices and volumes on both sides to depth , and the SequenceSample, composed of an input tensor and a label determined by mid-price change at .
2. Data Ingestion, Preprocessing, and Label Construction
The ingestion pipeline processes raw LOBSTER-format message and orderbook files, filters out auction activity and time ranges outside standard hours, removes crossed quotes, and collapses duplicate timestamps. Book snapshots are constructed as
where %%%%10%%%% defaults to 10. Rolling z-score normalization is applied, with mean and standard deviation computed over a 5-day window for each feature: For supervised learning, input sequences comprise consecutive normalized LOB records and a target label reflecting the movement of the mid-price over horizon : with the target assigned via thresholding as
where is a tunable parameter.
Balanced sampling ensures that during training, each mini-batch is uniformly stratified across label classes, capped at 5000 samples/day/class.
3. Modeling, Training, and Configuration
LOBFrame supports rapid specification and instantiation of state-of-the-art deep learning models, notably DeepLOB (convolutional layers followed by LSTM and fully-connected classifier). Models accept a standardized input tensor of shape and emit logits over three target classes. The object-oriented design exposes key methods:
DeepLOBModel.forward(x: Tensor[N, T, 4L]) → logits: Tensor[N, 3]Trainer.fit(),Trainer.test()- Confusion matrix and metric plotting utilities.
Training employs AdamW optimization with default hyperparameters (learning rate , betas , weight decay ), up to 100 epochs with early stopping on validation loss plateau (patience = 15).
Best-practice configuration and scaling include batched multi-worker data loading ( workers), optional parallelization via SLURM or Ray, storage of preprocessed data in HDF5 or Parquet, use of mixed-precision (AMP) for efficient GPU utilization, and checkpointing/gradient accumulation as required by system constraints.
4. Evaluation: Traditional Metrics and Practicability Simulator
LOBFrame provides both conventional machine learning diagnostics and a transaction-centric practicability measure. After completion of training and testing, users may invoke:
trainer.plot_confusion_matrix(mode='H10')trainer.plot_mcc_curve(horizon=50)trainer.plot_f1_curve()trainer.plot_accuracy_curve()
Standard reporting encompasses confusion matrices, averaged by tick class, and MCC/F1/accuracy curves versus decision threshold.
The practicability simulator supplements these metrics by quantifying the probability of correctly forecasting "complete transactions" (i.e., true open–close transitions), defined as: where is the set of true transaction transitions, is the set predicted by the model, and . Evaluator methods include:
sim = StrategySimulator(preds, targets, prob_threshold=0.5, horizon=50)p_T = sim.probability_correct_transaction()sim.plot_pT_vs_threshold()
This operational metric reflects the model's true utility for trading applications, addressing regimes where statistical accuracy may not translate into actionable signals (Briola et al., 2024).
5. Usage Patterns, Extensibility, and Deployment
To prepare and analyze a new LOB dataset using LOBFrame, practitioners export message/book CSVs in LOBSTER format, specify location and preprocessing choices in a configuration file, and invoke the main pipeline:
1 2 3 4 |
git clone https://github.com/FinancialComputingUCL/LOBFrame.git cd LOBFrame pip install -r requirements.txt python run.py --config my_config.yaml --model DeepLOB |
Scaling is supported via efficient serialization, multi-threaded loading, and cluster execution frameworks. This enables deployment on large-scale NASDAQ datasets, supporting research into deep learning models for order book forecasting with both statistical and operational assessment.
6. Context, Significance, and Research Implications
LOBFrame is released in conjunction with a comprehensive empirical study that reveals significant heterogeneity in deep learning forecasting performance across stocks with different microstructural properties. The results underscore two key points: (1) The predictive efficacy of deep neural architectures is not uniform across LOB regimes; (2) Classical accuracy metrics do not suffice for gauging practical utility in trading contexts. The introduction of the practicability simulator addresses this evaluation gap, providing a methodology for transaction-level assessment of model predictions. LOBFrame thus establishes a robust, reproducible platform for future research, enabling academics and practitioners to systematically benchmark, deploy, and understand the limitations of deep learning in LOB prediction tasks (Briola et al., 2024).