Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 147 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 424 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

Order Book Filtration and Directional Signal Extraction at High Frequency (2507.22712v1)

Published 30 Jul 2025 in q-fin.TR, q-fin.CP, q-fin.GN, q-fin.ST, and stat.ME

Abstract: With the advent of electronic capital markets and algorithmic trading agents, the number of events in tick-by-tick market data has exploded. A large fraction of these orders is transient. Their ephemeral character degrades the informativeness of directional alphas derived from the limit order book (LOB) state. We investigate whether directional signals such as order book imbalance (OBI) can be improved by structurally filtering high-frequency LOB data. Three real-time, observable filtration schemes: based on order lifetime, update count, and inter-update delay. These are used to recompute OBI on structurally filtered event streams. To assess the effect of filtration, we implement a three-layer diagnostic framework: contemporaneous correlation with returns, explanatory power under discretized regime counts, and causal coherence via Hawkes excitation norms. Empirical results show that structural filtration improves directional signal clarity in correlation and regime-based metrics, but leads to only limited gains in causal excitation strength. In contrast, OBI computed using trade events exhibits stronger causal alignment with future price movements. These findings highlight the importance of differentiating between associative and causal diagnostics when designing high-frequency directional signals.

Summary

  • The paper demonstrates that structural filtration on high-frequency order book data improves the Pearson correlation between order book imbalance signals and traded returns, achieving an improvement of about 11.3% with the modification-time filter.
  • It introduces and benchmarks three filtration schemes—lifetime-based, modification count-based, and modification time-based filtering—highlighting their roles in enhancing signal clarity and causal structure.
  • The study reveals that coupling trade-based order book imbalance with filtration yields significantly higher causal excitation scores, emphasizing the importance of selecting the right event stream for directional signal extraction.

Order Book Filtration and Directional Signal Extraction at High Frequency

This paper introduces a methodology to enhance directional signals derived from the limit order book (LOB) by structurally filtering high-frequency data. It investigates whether directional signals, such as order book imbalance (OBI), can be improved by filtering LOB data based on order lifetime, update count, and inter-update delay. The effectiveness of these filtration schemes is assessed using a three-layer diagnostic framework: contemporaneous correlation with returns, explanatory power under discretized regime counts, and causal coherence via Hawkes excitation norms.

Background and Motivation

Modern electronic markets produce a high volume of order book updates, reflecting the activity of algorithmic traders and market makers. However, much of this activity is transient and may not convey genuine directional intent. The paper addresses the issue of noise in high-frequency LOB data, which can degrade the informativeness of directional alphas like OBI. OBI, a widely used indicator, is sensitive to transient distortions such as flickering liquidity and rapid order cancellations, which can weaken its directional informativeness. The paper hypothesizes that by filtering out these transient events, the clarity of directional signals can be improved.

Methodology: Filtration Schemes and Diagnostic Framework

The paper employs three real-time, observable filtration schemes to recompute OBI:

  • Lifetime-based filtering: Discards orders that survive for less than a threshold duration.
  • Modification count filtering: Excludes orders subject to frequent updates.
  • Modification time filtering: Retains only those orders whose time between successive updates exceeds a minimum threshold.

The directional informativeness of the resulting signals is evaluated using a three-layer diagnostic framework:

  1. Contemporaneous Correlation Analysis: Assesses the Pearson correlation between raw OBI signals and traded returns.
  2. Explanatory Power under Discretized Regimes: Discretizes both OBI and returns into categorical regimes and evaluates their relationship using correlation between regime counts and multivariate regression.
  3. Causal Coherence: Uses a multivariate Hawkes process to estimate cross-excitation between OBI and return regimes.

(Figure 1)

Figure 1: Visual depiction of the evaluation window $\(\tau - h, \tau]$ used for computing both order book imbalance and realized return. The imbalance is based on the net directional event counts, while the return is computed using the first and last trade prices within the same window.

Figure 2

Figure 2: Discretization of OBI and returns into regime bins. Each evaluation window is mapped to a count vector over OBI regimes and a one-hot encoded return regime, forming the basis for correlation and regression analysis under a point-process framework.

Key Contributions

The paper makes several key contributions:

  1. It demonstrates that filtration enhances the Pearson correlation between raw OBI signals and traded returns, improving directional signal clarity.
  2. It shows that filtration enhances explanatory power under discretized regimes, as quantified by an autoregression-adjusted R2R^2.
  3. It reveals a cleaner causal structure between filtered imbalance and price movements using a multivariate Hawkes process.
  4. It implements and benchmarks multiple filtration schemes, finding that different filters improve different aspects of signal quality.

The paper's empirical analysis uses high-frequency LOB data from BANKNIFTY index futures. The filtering approach targets structurally transient activity by removing orders that reflect fleeting intent. In addition to filtered and unfiltered imbalance measures, the paper computes an alternative form of imbalance based exclusively on executed trades, referred to as OBI from trade events.

Formal Definitions and Problem Statement

The paper provides formal definitions for cumulative event count, event count over a lookback window, directional event count, order book imbalance, and realized return from trade event filtration. It defines the problem statement as evaluating filtration-driven signal strength, aiming to determine whether the strength of association between filtered order book imbalance and realized return is improved under various filtration schemes.

Scoring Framework and Evaluation Philosophy

The scoring framework includes three layers: raw-value correlation, regime-based alignment, and regime-to-regime excitation. Raw-value correlation captures contemporaneous co-movement between imbalance and return. Regime-based alignment assesses whether structured regime transitions align in a statistically meaningful way. Hawkes excitation scores test for lagged excitation from imbalance regimes to return regimes, reflecting causal structure.

The central objective is to assess whether structural filtration enhances the directional informativeness of flow-based indicators. The scoring functionals operate at varying levels of structural abstraction, from raw value-based correlation to regime-based count modeling and causal excitation metrics.

Filtration Schemes

The paper defines three filtration schemes: lifetime-based filtration, modification count-based filtration, and modification time-based filtration. Each scheme removes structurally unreliable events based on real-time observable properties.

  • Lifetime-based filtration: Removes events corresponding to orders with lifetime below a threshold Tˉ\bar{\mathcal{T}}.
  • Modification count-based filtration: Removes events tied to orders with modification count exceeding a threshold Mˉ\bar{M}.
  • Modification time-based filtration: Removes all events corresponding to orders with tightly clustered final modifications, i.e., Mj<Mˉ\mathcal{M}_j < \bar{\mathcal{M}}.

Evaluation Indicators: Order Book Imbalance (OBI and OBI\textsuperscript{(T)})

The Order Book Imbalance (OBI) measure captures the directional skew in event arrivals over a backward-looking window (τ−h,τ](\tau - h, \tau]. In addition to standard OBI, the paper introduces an alternative imbalance measure derived solely from signed trade activity, denoted as OBI\textsuperscript{(T)}. This trade-based OBI captures realized directional pressure by tallying buyer-initiated versus seller-initiated transactions within each window.

Experimental Setup and Results

The empirical paper is based on tick-by-tick data for BANKNIFTY futures and selected equities sourced from the Indian National Stock Exchange (NSE). The experiments span three representative trading days, and the evaluation involves varying thresholds for lifetime, modification lag, and modification count.

The results demonstrate that filtering OBI based on observable event structure yields improvements in associative strength, particularly under Pearson correlation and regime-wise cross-correlation. The modification-time filter (MTF) is consistently dominant across these layers. However, gains in causal excitation, as measured by Hawkes norms, remain modest when applied to standard OBI. By contrast, when OBI is reconstructed from trade events alone, filtration yields markedly higher causal excitation scores, particularly under MTF and LF.

Discussion and Implications

The findings suggest that filtration is highly effective in improving associative signal clarity, especially for correlation-based measures. However, when the objective is to extract causally coherent signals, the nature of the underlying event stream becomes critical. Order book state, even when filtered, may not provide sufficient causal alignment. In contrast, trade-based OBI, when coupled with structural filtration, reveals latent excitation structure more clearly.

One notable result is that the modification-time filter (MTF) yields the highest improvement in Pearson correlation (0.01133) over the unfiltered score (0.01018), a gain of approximately 11.3%. This highlights the potential of MTF for enhancing directional signal clarity.

Another significant finding is the performance of trade-based OBI. The excitation norm under modification-time filtering (MTF) rises sharply to 24.7352 on January 23, compared to 9.6726 for standard OBI on the same day. This underscores the importance of using trade-based measures for causal inference.

Conclusion

The paper introduces a structured methodology for evaluating directional signal quality under real-time order flow filtration in high-frequency markets. The findings suggest that filtering order events based on observable structural properties improves the clarity of Order Book Imbalance (OBI) as a contemporaneous directional signal. While the empirical analysis is restricted to specific datasets, the methodology is extensible to broader settings and may serve as a template for future investigations into the structural drivers of high-frequency alpha performance. The paper highlights the importance of considering the nature of the underlying event stream and the choice of filtration schemes for enhancing both associative and causal properties of directional signals.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: