Papers
Topics
Authors
Recent
2000 character limit reached

AIMM-GT: Ground Truth for Market Manipulation

Updated 25 December 2025
  • AIMM-GT is a curated benchmark dataset that connects Reddit-driven social activity with market anomaly signals using labeled ticker-day instances.
  • It employs a deterministic, rule-based framework that aggregates normalized signals such as social volume, sentiment, bot activity, coordination, and market data.
  • Validation via forward-walk testing and prospective logging demonstrates its efficacy for early-warning, regulatory surveillance, and risk stratification.

The AIMM Ground Truth Dataset (AIMM-GT) is a curated and labeled benchmark for evaluating automated detection of social-media-influenced stock market manipulation. It provides an empirical foundation for development and assessment of risk-scoring systems such as the AIMM Manipulation Risk Score (AMRS), explicitly connecting online narrative signals with observable market anomalies. AIMM-GT is integrated within the AIMM multimodal framework, which utilizes Reddit activity, bot detection, coordination scores, and OHLCV market features to generate daily risk evaluations per equity (Neela, 18 Dec 2025).

1. Motivation and Scope

Market manipulation increasingly exploits coordinated activity on social media platforms, particularly on forums such as Reddit. Regulators, brokerages, and retail investors require ground-truthed datasets that concretely link social media signals with market outcomes for research, surveillance tool development, and policy evaluation. AIMM-GT fulfills this need by offering labeled instances that represent both known cases of manipulation and rigorously matched controls (Neela, 18 Dec 2025).

AIMM-GT covers events from January 2021 to December 2024 and focuses on manipulation typified by coordinated campaigns visible in both social signals and market anomaly metrics. The dataset enables retrospective, prospective, and early warning capability evaluation in automated frameworks.

2. Dataset Composition and Labeling

AIMM-GT comprises 33 labeled ticker-days across eight equities: AAPL, AMC, AMZN, BB, GME, GOOGL, MSFT, and TSLA. Its construction was guided by three inclusion criteria:

  • Positive manipulation cases: Three well-documented events (one each for BB, GME, AMC), labeled by corroboration with @@@@3@@@@ enforcement releases and community-verified manipulation reports.
  • Control cases: Thirty matched normal trading days for the same tickers and window, specifically excluding periods of known manipulation.

Each instance in AIMM-GT is a (ticker, date) pair, labeled as positive or control. The CSV schema is:

1
{ticker, date, manipulation_category, confidence, provenance, binary_label}
Social media features are synthetic, statistically calibrated to mirror historical meme-stock events, while market features (OHLCV) use authentic data from Yahoo Finance (Neela, 18 Dec 2025).

3. Signal Families and Feature Engineering

AMRS inference over AIMM-GT relies on five normalized component signals, each mapped to daily ticker data:

  • si,tvols^{vol}_{i,t}: normalized social volume
  • si,tsentis^{senti}_{i,t}: normalized positive sentiment
  • si,tbots^{bot}_{i,t}: normalized bot-heavy activity ratio
  • si,tcoords^{coord}_{i,t}: normalized coordination score
  • si,tmkts^{mkt}_{i,t}: normalized market anomaly signal

Each underlying raw input is scaled using:

z~i,t=zi,tmintzi,tmaxtzi,tmintzi,t+ϵ\tilde{z}_{i,t} = \frac{z_{i,t} - \min_t z_{i,t}}{\max_t z_{i,t} - \min_t z_{i,t} + \epsilon}

Key signal definitions:

  • Social Volume: Count of Reddit posts mentioning a ticker per day.
  • Sentiment: Weighted mean of VADER (w=0.4) and FinBERT (w=0.6) sentiment collapses, the latter as P(pos)P(neg)P(pos) - P(neg).
  • Bot-Heavy Ratio: Proportion of posts by accounts scoring B(u)>0.5B(u) > 0.5, with

B(u)=wf1[f(u)>τf]+wd1[d(u)<τd]B(u) = w_f\,1[f(u) > \tau_f] + w_d\,1[d(u) < \tau_d]

where f(u)f(u) is posts/day, d(u)d(u) is subreddit-diversity, τf=10\tau_f=10, τd=3\tau_d=3, wf=0.7w_f=0.7, wd=0.3w_d=0.3.

  • Coordination: Near-duplicate fraction among up to 200 sampled posts, quantified as

coordination_scorei,t=2N(N1)i<j1[cos(vi,vj)>τc],τc=0.8coordination\_score_{i,t} = \frac{2}{N(N-1)} \sum_{i<j} 1[cos(v_i, v_j) > \tau_c], \quad \tau_c=0.8

  • Market Anomaly: Combines returni,t|\text{return}_{i,t}| and volume_zscorei,tvolume\_zscore_{i,t}, where

returni,t=closei,tclosei,t1closei,t1,volume_zscorei,t=volumei,tmeani,tstdi,treturn_{i,t} = \frac{close_{i,t} - close_{i,t-1}}{close_{i,t-1}}, \quad volume\_zscore_{i,t} = \frac{volume_{i,t} - \mathrm{mean}_{i,t}}{\mathrm{std}_{i,t}}

4. Score Aggregation and Model Interpretation

The AIMM framework is not a trained machine learning model but a deterministic, rule-based architecture. The AMRS is computed as a weighted sum:

AMRSi,t=wvolsi,tvol+wsentsi,tsenti+wbotsi,tbot+wcoordsi,tcoord+wmktsi,tmktAMRS_{i,t} = w_{vol}\,s^{vol}_{i,t} + w_{sent}\,s^{senti}_{i,t} + w_{bot}\,s^{bot}_{i,t} + w_{coord}\,s^{coord}_{i,t} + w_{mkt}\,s^{mkt}_{i,t}

with canonical weights

(wvol,wsent,wbot,wcoord,wmkt)=(0.25,0.15,0.20,0.20,0.20)(w_{vol}, w_{sent}, w_{bot}, w_{coord}, w_{mkt}) = (0.25, 0.15, 0.20, 0.20, 0.20)

Scores are clipped to [0,1][0,1]. All components are temporally normalized in forward-walk mode, with statistics computed on the expanding window to minimize look-ahead bias. Calibration and robustness checks (±20% on weights) show <3% change in event ranking and ROC-AUC stability in [0.71,0.74][0.71, 0.74] (Neela, 18 Dec 2025).

5. Evaluation Protocol and Metrics

AIMM-GT enables quantitative evaluation of risk scoring via two core methodologies:

  • Forward-walk (held-out) testing: AMRS evaluated on the labeled set with default alert threshold τ=0.5\tau=0.5 (high risk if AMRS0.5AMRS \geq 0.5).
    • At τ=0.5\tau=0.5:
    • TP = 0, FP = 0, TN = 30, FN = 3 (precision=0, recall=0).
    • Continuous ROC-AUC = 0.99, PR-AUC = 0.83.
    • Threshold sensitivity:
    • τ=0.2\tau=0.2: TP=3, FP=0, recall=1.0, precision=1.0, F1=1.0.
    • τ=0.3\tau=0.3: TP=1, FP=0, recall=0.33, precision=1.0, F1=0.5.
    • τ0.5\tau \geq 0.5: No positive alerts.
  • Prospective logging: Simulated "live" pipeline with the same 33 instances, τ=0.5\tau=0.5.
    • TP=3, FP=0, TN=30, FN=0. ROC-AUC ≈ 1.0, PR-AUC ≈ 1.0.

Lead-time analysis demonstrates the potential early-warning capability: For the GME event, AMRS first surpassed 0.55 on January 6, 2021—22 calendar days prior to the January 28 peak, with additional alerts logged on January 14, 19, and 25. Across all events, median lead time for pre-event high-risk classification was ≈3 trading days at τ=0.5\tau=0.5 (Neela, 18 Dec 2025).

6. Risk Levels and Operational Use

AIMM-GT supports risk stratification using AMRS-derived thresholds:

AMRS Value Risk Level Recommended Usage
<0.2< 0.2 Low Routine monitoring
0.2AMRS<0.50.2 \leq AMRS < 0.5 Medium Screening/early triage
0.5\geq 0.5 High Regulatory alert/suspicious window detection

A suspicious window requires both High risk and at least one corroborating anomaly (e.g., volume_zscore2.0volume\_zscore \geq 2.0, return5%|\text{return}| \geq 5\%, high coordination or bot-heavy ratio). Risk thresholds can be tuned for specific operational priorities:

  • Regulatory surveillance: τ0.5\tau \geq 0.5 to minimize false positives.
  • Triage/screening: τ0.2\tau \approx 0.2–$0.3$ for high recall; τ=0.2\tau = 0.2 achieved perfect detection on AIMM-GT.
  • Early-warning: τ0.3\tau \leq 0.3 for maximum advance notice at the expense of more frequent alerts (Neela, 18 Dec 2025).

7. Limitations and Prospective Applications

AIMM-GT is a small dataset (n=33, 3 positives) with synthetic but event-calibrated social features. As such, its primary value is enabling transparent benchmarking, proof-of-concept evaluation, and methodological iteration for automated manipulation-risk systems. Its schema, fusion logic, and underlying real-market data are intended to accelerate research in market surveillance tasks, especially those requiring joint modeling of social and market anomalies (Neela, 18 Dec 2025).

A plausible implication is that further expansion—with real social trace data and more manipulation events—would enhance statistical power and generality of benchmarks using AIMM-GT. Its integration with interpretable, modular frameworks facilitates regulatory scrutiny and scientific reproducibility.


Key Reference:

"AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation" (Neela, 18 Dec 2025)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to AIMM Ground Truth Dataset (AIMM-GT).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube