Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 60 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 35 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 176 tok/s Pro

GPT OSS 120B 448 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books (2507.14960v1)

Published 20 Jul 2025 in q-fin.TR, cs.AI, cs.LG, math.ST, and stat.TH

Abstract: The detection of outliers within cryptocurrency limit order books (LOBs) is of paramount importance for comprehending market dynamics, particularly in highly volatile and nascent regulatory environments. This study conducts a comprehensive comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency LOBs. Within a unified testing environment, named AITA Order Book Signal (AITA-OBS), we evaluate the efficacy of thirteen diverse models to identify which approaches are most suitable for detecting potentially manipulative trading behaviours. An empirical evaluation, conducted via backtesting on a dataset of 26,204 records from a major exchange, demonstrates that the top-performing model, Empirical Covariance (EC), achieves a 6.70% gain, significantly outperforming a standard Buy-and-Hold benchmark. These findings underscore the effectiveness of outlier-driven strategies and provide insights into the trade-offs between model complexity, trade frequency, and performance. This study contributes to the growing corpus of research on cryptocurrency market microstructure by furnishing a rigorous benchmark of anomaly detection models and highlighting their potential for augmenting algorithmic trading and risk management.

Summary

The paper demonstrates that the Empirical Covariance model achieved a 6.70% profit gain, outperforming traditional methods in Bitcoin LOBs.
It systematically benchmarks 13 statistical and machine learning models using high-frequency Bitcoin exchange data processed through the AITA-OBS pipeline.
Results indicate that while machine learning models like OC-SVM balance profit and efficiency, statistical methods offer robust, actionable trading signals.

Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books

Introduction

The paper "A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books" conducts a rigorous paper of anomaly detection methodologies applied to cryptocurrency markets, specifically focusing on Bitcoin limit order books (LOBs). The unique market characteristics of cryptocurrencies, such as extreme volatility and liquidity fluctuations, necessitate sophisticated models to detect manipulative trading behaviors. This paper aims to benchmark various statistical and machine learning models for effective outlier detection within a unified testing environment, AITA Order Book Signal (AITA-OBS).

The instability and nascent regulatory structure prevalent in cryptocurrency markets present challenges that traditional financial instruments do not face. Anomalies in LOBs, such as spoofing and wash trading, disrupt market integrity, making real-time anomaly detection critical. Prior research by Koutmos and others highlighted the importance of order flow dynamics in cryptocurrencies, emphasizing deficiencies in traditional anomaly detection models within high-frequency trading contexts.

This paper extends existing research by implementing a systematic performance benchmark across diverse models, such as Empirical Covariance (EC), Histogram-Based Outlier Score (HBOS), and machine learning techniques like One-Class Support Vector Machine (OC-SVM) and Local Outlier Factor (LOF).

Experimental Design and Methodology

The paper utilizes a dataset of 26,204 records from a high-frequency Bitcoin exchange, processed within the AITA-OBS framework. Feature engineering is based on OHLC (Open, High, Low, Close) data, encompassing indicators like execution price deviations, bid-ask spreads, trade volumes, and market depth metrics. The paper evaluates thirteen models, categorized into statistical methods and unsupervised machine learning approaches.

Statistical models, including the Elliptic Envelope and the Minimum Covariance Determinant, focus on distribution-based anomaly identification. Machine learning models, such as IsoF and CBLOF, leverage complex, non-linear patterns for detection. Models were optimized via grid search and cross-validation, with parameters tailored to LOB application scenarios.

From Outlier Score to Trading Signal

The paper transitions from raw anomaly scores to actionable trading signals through the AITA-OBS pipeline. Scores are normalized via Min-Max scaling across models, and a dynamic thresholding method defines binary signals: trades are triggered when scores exceed the 95th percentile. The mean-reversion strategy aligns trade direction against momentum, with fixed fractional position sizing of 33.33% ensuring consistent risk management.

Results and Discussion

The paper's empirical analysis highlights the Empirical Covariance (EC) model as the best performer among statistical methods, achieving a 6.70% profit gain. The equity curve illustrates steady growth, contrasted with underperformance by MCD and EE. Among machine learning models, CBLOF stands out, though high trade frequency limits practical feasibility due to transaction costs. Conversely, OC-SVM balances profit with efficiency, suggesting a favorable strategy for real-world application.

Figure 1: The bid/ask imbalance variability and the price momentum over the evaluation period.

Figure 2: Equity curves of statistical models. The dotted black line represents the daily B{content}H benchmark.

Figure 3: Equity curves of machine learning models. The dotted black line represents the daily B{content}H benchmark.

Conclusion

The research successfully benchmarks anomaly detection models for Bitcoin LOBs. Simple statistical approaches, like EC, demonstrate efficacy, while machine learning models advance detection capabilities despite transaction cost challenges. The paper illustrates the potential of outlier-driven strategies to outperform buy-and-hold benchmarks in volatile markets. Future research should explore adaptive ensemble models and broader market applications to strengthen the robustness and generalizability of anomaly detection frameworks.

The integration into the AITA framework promises further exploration, considering optimization strategies and expanding towards secure, high-performance AI-driven trading systems. Integrating ethical considerations and multi-agent systems for adaptive learning will be essential for evolving trading systems amidst complex market dynamics.