AIMM-GT: Ground Truth for Market Manipulation
- AIMM-GT is a curated benchmark dataset that connects Reddit-driven social activity with market anomaly signals using labeled ticker-day instances.
- It employs a deterministic, rule-based framework that aggregates normalized signals such as social volume, sentiment, bot activity, coordination, and market data.
- Validation via forward-walk testing and prospective logging demonstrates its efficacy for early-warning, regulatory surveillance, and risk stratification.
The AIMM Ground Truth Dataset (AIMM-GT) is a curated and labeled benchmark for evaluating automated detection of social-media-influenced stock market manipulation. It provides an empirical foundation for development and assessment of risk-scoring systems such as the AIMM Manipulation Risk Score (AMRS), explicitly connecting online narrative signals with observable market anomalies. AIMM-GT is integrated within the AIMM multimodal framework, which utilizes Reddit activity, bot detection, coordination scores, and OHLCV market features to generate daily risk evaluations per equity (Neela, 18 Dec 2025).
1. Motivation and Scope
Market manipulation increasingly exploits coordinated activity on social media platforms, particularly on forums such as Reddit. Regulators, brokerages, and retail investors require ground-truthed datasets that concretely link social media signals with market outcomes for research, surveillance tool development, and policy evaluation. AIMM-GT fulfills this need by offering labeled instances that represent both known cases of manipulation and rigorously matched controls (Neela, 18 Dec 2025).
AIMM-GT covers events from January 2021 to December 2024 and focuses on manipulation typified by coordinated campaigns visible in both social signals and market anomaly metrics. The dataset enables retrospective, prospective, and early warning capability evaluation in automated frameworks.
2. Dataset Composition and Labeling
AIMM-GT comprises 33 labeled ticker-days across eight equities: AAPL, AMC, AMZN, BB, GME, GOOGL, MSFT, and TSLA. Its construction was guided by three inclusion criteria:
- Positive manipulation cases: Three well-documented events (one each for BB, GME, AMC), labeled by corroboration with @@@@3@@@@ enforcement releases and community-verified manipulation reports.
- Control cases: Thirty matched normal trading days for the same tickers and window, specifically excluding periods of known manipulation.
Each instance in AIMM-GT is a (ticker, date) pair, labeled as positive or control. The CSV schema is:
1 |
{ticker, date, manipulation_category, confidence, provenance, binary_label} |
3. Signal Families and Feature Engineering
AMRS inference over AIMM-GT relies on five normalized component signals, each mapped to daily ticker data:
- : normalized social volume
- : normalized positive sentiment
- : normalized bot-heavy activity ratio
- : normalized coordination score
- : normalized market anomaly signal
Each underlying raw input is scaled using:
Key signal definitions:
- Social Volume: Count of Reddit posts mentioning a ticker per day.
- Sentiment: Weighted mean of VADER (w=0.4) and FinBERT (w=0.6) sentiment collapses, the latter as .
- Bot-Heavy Ratio: Proportion of posts by accounts scoring , with
where is posts/day, is subreddit-diversity, , , , .
- Coordination: Near-duplicate fraction among up to 200 sampled posts, quantified as
- Market Anomaly: Combines and , where
4. Score Aggregation and Model Interpretation
The AIMM framework is not a trained machine learning model but a deterministic, rule-based architecture. The AMRS is computed as a weighted sum:
with canonical weights
Scores are clipped to . All components are temporally normalized in forward-walk mode, with statistics computed on the expanding window to minimize look-ahead bias. Calibration and robustness checks (±20% on weights) show <3% change in event ranking and ROC-AUC stability in (Neela, 18 Dec 2025).
5. Evaluation Protocol and Metrics
AIMM-GT enables quantitative evaluation of risk scoring via two core methodologies:
- Forward-walk (held-out) testing: AMRS evaluated on the labeled set with default alert threshold (high risk if ).
- At :
- TP = 0, FP = 0, TN = 30, FN = 3 (precision=0, recall=0).
- Continuous ROC-AUC = 0.99, PR-AUC = 0.83.
- Threshold sensitivity:
- : TP=3, FP=0, recall=1.0, precision=1.0, F1=1.0.
- : TP=1, FP=0, recall=0.33, precision=1.0, F1=0.5.
- : No positive alerts.
- Prospective logging: Simulated "live" pipeline with the same 33 instances, .
- TP=3, FP=0, TN=30, FN=0. ROC-AUC ≈ 1.0, PR-AUC ≈ 1.0.
Lead-time analysis demonstrates the potential early-warning capability: For the GME event, AMRS first surpassed 0.55 on January 6, 2021—22 calendar days prior to the January 28 peak, with additional alerts logged on January 14, 19, and 25. Across all events, median lead time for pre-event high-risk classification was ≈3 trading days at (Neela, 18 Dec 2025).
6. Risk Levels and Operational Use
AIMM-GT supports risk stratification using AMRS-derived thresholds:
| AMRS Value | Risk Level | Recommended Usage |
|---|---|---|
| Low | Routine monitoring | |
| Medium | Screening/early triage | |
| High | Regulatory alert/suspicious window detection |
A suspicious window requires both High risk and at least one corroborating anomaly (e.g., , , high coordination or bot-heavy ratio). Risk thresholds can be tuned for specific operational priorities:
- Regulatory surveillance: to minimize false positives.
- Triage/screening: –$0.3$ for high recall; achieved perfect detection on AIMM-GT.
- Early-warning: for maximum advance notice at the expense of more frequent alerts (Neela, 18 Dec 2025).
7. Limitations and Prospective Applications
AIMM-GT is a small dataset (n=33, 3 positives) with synthetic but event-calibrated social features. As such, its primary value is enabling transparent benchmarking, proof-of-concept evaluation, and methodological iteration for automated manipulation-risk systems. Its schema, fusion logic, and underlying real-market data are intended to accelerate research in market surveillance tasks, especially those requiring joint modeling of social and market anomalies (Neela, 18 Dec 2025).
A plausible implication is that further expansion—with real social trace data and more manipulation events—would enhance statistical power and generality of benchmarks using AIMM-GT. Its integration with interpretable, modular frameworks facilitates regulatory scrutiny and scientific reproducibility.
Key Reference:
"AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation" (Neela, 18 Dec 2025)