- The paper demonstrates that structural filtration on high-frequency order book data improves the Pearson correlation between order book imbalance signals and traded returns, achieving an improvement of about 11.3% with the modification-time filter.
 
        - It introduces and benchmarks three filtration schemes—lifetime-based, modification count-based, and modification time-based filtering—highlighting their roles in enhancing signal clarity and causal structure.
 
        - The study reveals that coupling trade-based order book imbalance with filtration yields significantly higher causal excitation scores, emphasizing the importance of selecting the right event stream for directional signal extraction.
 
    
   
 
      Order Book Filtration and Directional Signal Extraction at High Frequency
This paper introduces a methodology to enhance directional signals derived from the limit order book (LOB) by structurally filtering high-frequency data. It investigates whether directional signals, such as order book imbalance (OBI), can be improved by filtering LOB data based on order lifetime, update count, and inter-update delay. The effectiveness of these filtration schemes is assessed using a three-layer diagnostic framework: contemporaneous correlation with returns, explanatory power under discretized regime counts, and causal coherence via Hawkes excitation norms.
Background and Motivation
Modern electronic markets produce a high volume of order book updates, reflecting the activity of algorithmic traders and market makers. However, much of this activity is transient and may not convey genuine directional intent. The paper addresses the issue of noise in high-frequency LOB data, which can degrade the informativeness of directional alphas like OBI. OBI, a widely used indicator, is sensitive to transient distortions such as flickering liquidity and rapid order cancellations, which can weaken its directional informativeness. The paper hypothesizes that by filtering out these transient events, the clarity of directional signals can be improved.
Methodology: Filtration Schemes and Diagnostic Framework
The paper employs three real-time, observable filtration schemes to recompute OBI:
- Lifetime-based filtering: Discards orders that survive for less than a threshold duration.
 
- Modification count filtering: Excludes orders subject to frequent updates.
 
- Modification time filtering: Retains only those orders whose time between successive updates exceeds a minimum threshold.
 
The directional informativeness of the resulting signals is evaluated using a three-layer diagnostic framework:
- Contemporaneous Correlation Analysis: Assesses the Pearson correlation between raw OBI signals and traded returns.
 
- Explanatory Power under Discretized Regimes: Discretizes both OBI and returns into categorical regimes and evaluates their relationship using correlation between regime counts and multivariate regression.
 
- Causal Coherence: Uses a multivariate Hawkes process to estimate cross-excitation between OBI and return regimes.
 
(Figure 1)
Figure 1: Visual depiction of the evaluation window $\(\tau - h, \tau]$ used for computing both order book imbalance and realized return. The imbalance is based on the net directional event counts, while the return is computed using the first and last trade prices within the same window.
Figure 2: Discretization of OBI and returns into regime bins. Each evaluation window is mapped to a count vector over OBI regimes and a one-hot encoded return regime, forming the basis for correlation and regression analysis under a point-process framework.
Key Contributions
The paper makes several key contributions:
- It demonstrates that filtration enhances the Pearson correlation between raw OBI signals and traded returns, improving directional signal clarity.
 
- It shows that filtration enhances explanatory power under discretized regimes, as quantified by an autoregression-adjusted R2.
 
- It reveals a cleaner causal structure between filtered imbalance and price movements using a multivariate Hawkes process.
 
- It implements and benchmarks multiple filtration schemes, finding that different filters improve different aspects of signal quality.
 
The paper's empirical analysis uses high-frequency LOB data from BANKNIFTY index futures. The filtering approach targets structurally transient activity by removing orders that reflect fleeting intent. In addition to filtered and unfiltered imbalance measures, the paper computes an alternative form of imbalance based exclusively on executed trades, referred to as OBI from trade events.
The paper provides formal definitions for cumulative event count, event count over a lookback window, directional event count, order book imbalance, and realized return from trade event filtration. It defines the problem statement as evaluating filtration-driven signal strength, aiming to determine whether the strength of association between filtered order book imbalance and realized return is improved under various filtration schemes.
Scoring Framework and Evaluation Philosophy
The scoring framework includes three layers: raw-value correlation, regime-based alignment, and regime-to-regime excitation. Raw-value correlation captures contemporaneous co-movement between imbalance and return. Regime-based alignment assesses whether structured regime transitions align in a statistically meaningful way. Hawkes excitation scores test for lagged excitation from imbalance regimes to return regimes, reflecting causal structure.
The central objective is to assess whether structural filtration enhances the directional informativeness of flow-based indicators. The scoring functionals operate at varying levels of structural abstraction, from raw value-based correlation to regime-based count modeling and causal excitation metrics.
Filtration Schemes
The paper defines three filtration schemes: lifetime-based filtration, modification count-based filtration, and modification time-based filtration. Each scheme removes structurally unreliable events based on real-time observable properties.
- Lifetime-based filtration: Removes events corresponding to orders with lifetime below a threshold Tˉ.
 
- Modification count-based filtration: Removes events tied to orders with modification count exceeding a threshold Mˉ.
 
- Modification time-based filtration: Removes all events corresponding to orders with tightly clustered final modifications, i.e., Mj​<Mˉ.
 
Evaluation Indicators: Order Book Imbalance (OBI and OBI\textsuperscript{(T)})
The Order Book Imbalance (OBI) measure captures the directional skew in event arrivals over a backward-looking window (τ−h,τ]. In addition to standard OBI, the paper introduces an alternative imbalance measure derived solely from signed trade activity, denoted as OBI\textsuperscript{(T)}. This trade-based OBI captures realized directional pressure by tallying buyer-initiated versus seller-initiated transactions within each window.
Experimental Setup and Results
The empirical paper is based on tick-by-tick data for BANKNIFTY futures and selected equities sourced from the Indian National Stock Exchange (NSE). The experiments span three representative trading days, and the evaluation involves varying thresholds for lifetime, modification lag, and modification count.
The results demonstrate that filtering OBI based on observable event structure yields improvements in associative strength, particularly under Pearson correlation and regime-wise cross-correlation. The modification-time filter (MTF) is consistently dominant across these layers. However, gains in causal excitation, as measured by Hawkes norms, remain modest when applied to standard OBI. By contrast, when OBI is reconstructed from trade events alone, filtration yields markedly higher causal excitation scores, particularly under MTF and LF.
Discussion and Implications
The findings suggest that filtration is highly effective in improving associative signal clarity, especially for correlation-based measures. However, when the objective is to extract causally coherent signals, the nature of the underlying event stream becomes critical. Order book state, even when filtered, may not provide sufficient causal alignment. In contrast, trade-based OBI, when coupled with structural filtration, reveals latent excitation structure more clearly.
One notable result is that the modification-time filter (MTF) yields the highest improvement in Pearson correlation (0.01133) over the unfiltered score (0.01018), a gain of approximately 11.3%. This highlights the potential of MTF for enhancing directional signal clarity.
Another significant finding is the performance of trade-based OBI. The excitation norm under modification-time filtering (MTF) rises sharply to 24.7352 on January 23, compared to 9.6726 for standard OBI on the same day. This underscores the importance of using trade-based measures for causal inference.
Conclusion
The paper introduces a structured methodology for evaluating directional signal quality under real-time order flow filtration in high-frequency markets. The findings suggest that filtering order events based on observable structural properties improves the clarity of Order Book Imbalance (OBI) as a contemporaneous directional signal. While the empirical analysis is restricted to specific datasets, the methodology is extensible to broader settings and may serve as a template for future investigations into the structural drivers of high-frequency alpha performance. The paper highlights the importance of considering the nature of the underlying event stream and the choice of filtration schemes for enhancing both associative and causal properties of directional signals.