- The paper demonstrates that the Empirical Covariance model achieved a 6.70% profit gain, outperforming traditional methods in Bitcoin LOBs.
- It systematically benchmarks 13 statistical and machine learning models using high-frequency Bitcoin exchange data processed through the AITA-OBS pipeline.
- Results indicate that while machine learning models like OC-SVM balance profit and efficiency, statistical methods offer robust, actionable trading signals.
Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books
Introduction
The paper "A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books" conducts a rigorous paper of anomaly detection methodologies applied to cryptocurrency markets, specifically focusing on Bitcoin limit order books (LOBs). The unique market characteristics of cryptocurrencies, such as extreme volatility and liquidity fluctuations, necessitate sophisticated models to detect manipulative trading behaviors. This paper aims to benchmark various statistical and machine learning models for effective outlier detection within a unified testing environment, AITA Order Book Signal (AITA-OBS).
The instability and nascent regulatory structure prevalent in cryptocurrency markets present challenges that traditional financial instruments do not face. Anomalies in LOBs, such as spoofing and wash trading, disrupt market integrity, making real-time anomaly detection critical. Prior research by Koutmos and others highlighted the importance of order flow dynamics in cryptocurrencies, emphasizing deficiencies in traditional anomaly detection models within high-frequency trading contexts.
This paper extends existing research by implementing a systematic performance benchmark across diverse models, such as Empirical Covariance (EC), Histogram-Based Outlier Score (HBOS), and machine learning techniques like One-Class Support Vector Machine (OC-SVM) and Local Outlier Factor (LOF).
Experimental Design and Methodology
The paper utilizes a dataset of 26,204 records from a high-frequency Bitcoin exchange, processed within the AITA-OBS framework. Feature engineering is based on OHLC (Open, High, Low, Close) data, encompassing indicators like execution price deviations, bid-ask spreads, trade volumes, and market depth metrics. The paper evaluates thirteen models, categorized into statistical methods and unsupervised machine learning approaches.
Statistical models, including the Elliptic Envelope and the Minimum Covariance Determinant, focus on distribution-based anomaly identification. Machine learning models, such as IsoF and CBLOF, leverage complex, non-linear patterns for detection. Models were optimized via grid search and cross-validation, with parameters tailored to LOB application scenarios.
From Outlier Score to Trading Signal
The paper transitions from raw anomaly scores to actionable trading signals through the AITA-OBS pipeline. Scores are normalized via Min-Max scaling across models, and a dynamic thresholding method defines binary signals: trades are triggered when scores exceed the 95th percentile. The mean-reversion strategy aligns trade direction against momentum, with fixed fractional position sizing of 33.33% ensuring consistent risk management.
Results and Discussion
The paper's empirical analysis highlights the Empirical Covariance (EC) model as the best performer among statistical methods, achieving a 6.70% profit gain. The equity curve illustrates steady growth, contrasted with underperformance by MCD and EE. Among machine learning models, CBLOF stands out, though high trade frequency limits practical feasibility due to transaction costs. Conversely, OC-SVM balances profit with efficiency, suggesting a favorable strategy for real-world application.
Figure 1: The bid/ask imbalance variability and the price momentum over the evaluation period.
Figure 2: Equity curves of statistical models. The dotted black line represents the daily B{content}H benchmark.
Figure 3: Equity curves of machine learning models. The dotted black line represents the daily B{content}H benchmark.
Conclusion
The research successfully benchmarks anomaly detection models for Bitcoin LOBs. Simple statistical approaches, like EC, demonstrate efficacy, while machine learning models advance detection capabilities despite transaction cost challenges. The paper illustrates the potential of outlier-driven strategies to outperform buy-and-hold benchmarks in volatile markets. Future research should explore adaptive ensemble models and broader market applications to strengthen the robustness and generalizability of anomaly detection frameworks.
The integration into the AITA framework promises further exploration, considering optimization strategies and expanding towards secure, high-performance AI-driven trading systems. Integrating ethical considerations and multi-agent systems for adaptive learning will be essential for evolving trading systems amidst complex market dynamics.