Papers
Topics
Authors
Recent
2000 character limit reached

MCCLD: Multi-Task Crowdsourcing Laundering Detection

Updated 9 December 2025
  • The paper introduces MCCLD, a multi-task GNN framework that jointly optimizes laundering transaction and group detection in stablecoin ecosystems.
  • It leverages a shared multi-level encoder with auxiliary group information, achieving significant improvements in F1 and AUC metrics over baseline models.
  • The framework formalizes crowdsourcing laundering as binary classification tasks, demonstrating robust performance across diverse datasets and real-world scenarios.

Multi-Task Collaborative Crowdsourcing Laundering Detection (MCCLD) is a framework designed for the detection of novel, rapidly emerging money laundering patterns, particularly in stablecoin ecosystems such as USDT. Crowdsourcing laundering disperses illicit flows by recruiting large numbers of ordinary individuals, circumventing detection by leveraging the diversity of transaction and organizational patterns. MCCLD jointly optimizes laundering transaction identification and transaction group inference using a multi-task graph neural network, driven by multi-level feature fusion and auxiliary group information, and achieves state-of-the-art performance in both crowdsourcing and general laundering scenarios (Li et al., 2 Dec 2025).

1. Problem Landscape and Formalization

Crowdsourcing cryptocurrency laundering, also known as "running-points" or "black USDT sales," entails launderers recruiting numerous users ("motorcade" members) to conceal fund origins through many KYC-approved accounts. Organizationally, these patterns form polycentric, multi-gang, weakly linked structures, making canonical graph partitioning or simple subgraph-mining insufficient. Two main detection tasks are formalized:

  • Laundering Transaction Detection: Binary classification of each transaction e=(u,v)e=(u,v) as laundering ($1$) or legitimate ($0$), with ground-truth L{0,1}m\mathbf{L}\in\{0,1\}^m.
  • Transaction Group Detection: Binary identification of whether a transaction occurs within a constructed “transaction group”—i.e., between accounts in the same weakly-connected component of a delegation (staking) subgraph (I{0,1}m\mathbf{I}\in\{0,1\}^m).

Let the transaction graph be G=(A,T,W)\mathcal{G}=(\mathcal{A},\mathcal{T},\mathcal{W}) with node attributes Xv\mathbf{X}_v, edge features We\mathbf{W}_e, and tasks to find learning functions Fd,Cl\mathcal{F}_d, \mathcal{C}_l (laundering) and Fg,Cg\mathcal{F}_g, \mathcal{C}_g (groups), such that Cl(Fd(G,ei))Li\mathcal{C}_l(\mathcal{F}_d(\mathcal{G},e_i)) \approx \mathbf{L}_i and Cg(Fg(G,ei))Ii\mathcal{C}_g(\mathcal{F}_g(\mathcal{G},e_i)) \approx \mathbf{I}_i.

2. MCCLD Model Architecture

MCCLD utilizes an end-to-end graph neural network (GNN) architecture with the following features:

  • Shared Multi-Level Encoder (F\mathcal{F}): GIN-style aggregation and MLP-based fusion of account- and transaction-level features, yielding edge embeddings TiRD\mathbf{T}_i\in\mathbb{R}^D.
  • Shared Classifier (C\mathcal{C}): Output matrix MRm×4\mathbf{M}\in\mathbb{R}^{m\times4} providing softmax scores for four classes: legit/laundering vs. inter-group/intra-group.

Feature Encoding and Propagation

  • Account node features: Avi0=Xvi\mathbf{A}^0_{v_i} = \mathbf{X}_{v_i}.
  • Transaction edge features: Wei0=Wei\mathbf{W}^0_{e_i} = \mathbf{W}_{e_i}.

Each GNN block (layer) iteratively updates:

  1. Account embeddings (using GIN message-passing):

Mvik=1N(vi)vjN(vi)ϕθe(Ajk)M^k_{v_i} = \frac{1}{|\mathcal{N}(v_i)|} \sum_{v_j\in\mathcal{N}(v_i)} \phi^e_\theta(\mathbf{A}_j^k)

Avik+1=ϕθa(Avik,Mvik)\mathbf{A}_{v_i}^{k+1} = \phi^a_\theta(\mathbf{A}_{v_i}^k, M^k_{v_i})

  1. Transaction features (bottleneck MLP):

Weik+1=ϕθw(Weik,Teik)\mathbf{W}^{k+1}_{e_i} = \phi^w_\theta (\mathbf{W}^k_{e_i}, \mathbf{T}^k_{e_i})

  1. Fused transaction embedding:

Meik=[AvrikAvsik]M^k_{e_i} = [\mathbf{A}^k_{v_{r_i}} \Vert \mathbf{A}^k_{v_{s_i}}]

Teik+1=ϕθt(Teik,[WeikMeik])\mathbf{T}^{k+1}_{e_i} = \phi^t_\theta (\mathbf{T}^k_{e_i}, [\mathbf{W}^k_{e_i} \Vert M^k_{e_i}])

Here, ϕθe,ϕθa\phi^e_\theta, \phi^a_\theta are two-layer MLPs with ReLU (from GIN); ϕθw,ϕθt\phi^w_\theta, \phi^t_\theta are two-layer MLPs with batch normalizing and ReLU. \Vert denotes vector concatenation. After KK layers (K=2K=2 in practice), output is Ti=TiKRD\mathbf{T}_i = \mathbf{T}^K_i \in \mathbb{R}^D.

Classification Objective

The classifier outputs, for each transaction eie_i: Mi=(plegit,plaundering,pinter,pintra)\mathbf{M}_i = (p_{\mathrm{legit}}, p_{\mathrm{laundering}}, p_{\mathrm{inter}}, p_{\mathrm{intra}})

3. Multi-Task Training Objective

Separate loss terms are combined to optimize both detection tasks:

  • Laundering detection loss (weighted binary cross-entropy):

Ltxn=1mi=1my{0,1}wy1{Li=y}lnpi,y\mathcal{L}_{txn} = -\frac{1}{m} \sum_{i=1}^m \sum_{y\in\{0,1\}} w_y\,\mathbf{1}\{\mathbf{L}_i=y\}\,\ln p_{i,y}

where pi,1=plaundering,pi,0=plegitp_{i,1}=p_{\mathrm{laundering}}, p_{i,0}=p_{\mathrm{legit}}; wyw_y for class balance.

  • Group detection loss (binary cross-entropy):

Lgroup=1mi=1m[Iiln(pintra(ei))+(1Ii)ln(pinter(ei))]\mathcal{L}_{group} = -\frac{1}{m}\sum_{i=1}^m \left[ \mathbf{I}_i\ln(p_{\mathrm{intra}}(e_i)) + (1-\mathbf{I}_i)\ln(p_{\mathrm{inter}}(e_i)) \right]

  • Joint objective:

Ltotal=αLtxn+βLgroup\mathcal{L}_{total} = \alpha\,\mathcal{L}_{txn} + \beta\,\mathcal{L}_{group}

with α=1,β=0.5\alpha=1, \beta=0.5 yielding favorable trade-offs on F1/AUC in the primary evaluations.

4. Incorporation of Auxiliary Information

Crowdsourcing laundering exploits network structures via delegation and staking. MCCLD integrates auxiliary group knowledge as follows:

  • Transaction Group Construction: On blockchains such as TRON, native delegation (staking) subgraphs are extracted. Their weakly connected components define “account groups”; transactions wholly within a component are labeled intra-group.
  • Subgraph Sampling: To surpass hardware constraints and boost generalization, training operates on induced subgraphs ($2$k–$5$k edges), using random seed edges with neighbor expansion to encompass $2$-hop neighborhoods.

Table: Auxiliary Group Construction Methods and Effects (Tron-USDT dataset)

Method F1 AUC
No group 0.853 0.971
Native staking group 0.950 0.980
GMPA-generated group 0.956 0.986
Louvain group 0.941 0.980

The use of high-fidelity group assignments confers the largest performance improvements.

5. Training and Dataset Overview

Key hyperparameters and stats:

  • Model Hyperparameters: 2 GNN layers, 64-dim embeddings, learning rate 6×1036\times10^{-3}, Adam (β1=0.9,β2=0.999\beta_1 = 0.9, \beta_2 = 0.999), batch size 4 subgraphs per step, weight decay 10410^{-4}, early stopping (patience=20).
  • Datasets Evaluated:
    • Tron-USDT: $1,043k$ accounts, $2.6M$ transactions, $252$ positives, $10,529$ negatives for laundering; $18,997$ intra-group edges.
    • Harmony hack: $340k$ accounts, $1.8M$ txns.
    • Upbit hack: $577k$ accounts, $2.3M$ txns.
    • IBM-LI synthetic: $705k$ accounts, $7.0M$ txns.

Preprocessing follows established graph learning conventions, with KYC, temporal, and attribute-based node features included as available.

6. Empirical Performance and Analysis

Evaluation Metrics

  • F1 Score: 2Precision×RecallPrecision+Recall2{\cdot}\frac{\mathrm{Precision} \times \mathrm{Recall}}{\mathrm{Precision} + \mathrm{Recall}}
  • AUC (Area under ROC Curve)

Benchmarking and Baselines

Comparison methods include subgraph-based (GMPA, AntiBenford) and transaction-based GNNs (GCN, GAT, GraphSAGE, PNA).

Table: Results (General Laundering Datasets)

Method Upbit (F1/AUC) Harmony (F1/AUC) IBM-LI (F1/AUC)
PNA 0.836/0.892 0.774/0.911 0.261/0.700
GCN 0.636/0.759 0.588/0.908 0.091/0.528
MCCLD 0.885/0.923 0.806/0.916 0.405/0.737
  • Crowdsourcing Laundering (Tron-USDT): F1 =0.950= 0.950, AUC =0.980= 0.980 for MCCLD; average F1 uplift +53.4%+53.4\% over baselines, AUC +25.2%+25.2\%.
  • General Laundering: Average F1 gain +36.9%+36.9\% for MCCLD.

Ablation and Robustness

  • Exclusion of group info substantially impairs F1. Native staking groups and high-fidelity group assignments (GMPA, Louvain) result in strongest performance.
  • Under label-scarcity, MCCLD's F1 degrades gracefully: 10%10\% labels yield F10.72F1 \approx 0.72, 30%30\% labels F1>0.85F1 > 0.85.

7. Implications, Limitations, and Prospects

MCCLD's efficacy arises from multi-level feature fusion, the enforcement of intra-group embedding consistency by the auxiliary task, and explicitly shared classifier/encoder designs. These elements are particularly suited for addressing the heterogeneity and polycentricity of crowdsourcing laundering groups.

Limitations include dependence on at least partial supervision or availability of high-quality “group” partitions, as well as the potential of subgraph sampling to miss long transaction chains.

Potential future directions:

  • Dynamic GNN extensions to capture temporal laundering patterns.
  • End-to-end learnable group assignment via contrastive objectives, mitigating the reliance on external detectors.
  • Integration of cross-chain flows to generalize to multi-blockchain laundering operations (Li et al., 2 Dec 2025).

A plausible implication is that frameworks based on MCCLD may generalize to other domains exhibiting network-based polycentricity and fine-grained actor heterogeneity. The availability of public pseudocode and detailed methodology enables replication on modern GNN toolkits.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Multi-Task Collaborative Crowdsourcing Laundering Detection (MCCLD).