IFD: A Large-Scale Benchmark for Insider Filing Violation Detection (2507.20162v1)

Published 27 Jul 2025 in cs.CE

Abstract: Insider trading violations, particularly delayed disclosures of Form 4 filings, remain a persistent challenge for financial market surveillance. Despite regulatory requirements such as the two-business-day rule of the Securities and Exchange Commission (SEC), enforcement is limited by the lack of large-scale, labeled datasets and task-specific benchmarks. In this paper, we introduce Insider Filing Delay (IFD), the first and largest publicly available dataset for insider disclosure behavior, comprising over one million Form 4 transactions spanning two decades (2002-2025), with structured annotations on delay status, insider roles, governance factors, and firm-level financial indicators. IFD enables the first large-scale formulation of strategic disclosure violation detection as a binary classification task grounded in regulatory compliance. To demonstrate the utility of IFD, we propose MaBoost, a hybrid framework combining a Mamba-based state space encoder with XGBoost, achieving high accuracy and interpretability in identifying high-risk behavioral patterns. Experiments across statistical baselines, deep learning models, and LLMs confirm that MaBoost outperforms prior approaches, achieving an F1-score of up to 99.47% under constrained regulatory settings. IFD provides a realistic, reproducible, and behavior-rich benchmark for developing AI models in financial compliance, regulatory forensics, and interpretable time-series classification. All data and codes are available: https://github.com/CH-YellowOrange/MaBoost-and-IFD.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (11)

GitHub

GitHub - CH-YellowOrange/MaBoost-and-IFD: AttnBoost and IFD

IFD: A Large-Scale Benchmark for Insider Filing Violation Detection (2507.20162v1)

Summary

Follow-up Questions

Related Papers

Authors (11)

GitHub

Don't miss out on important new AI/ML research