MCCLD: Multi-Task Crowdsourcing Laundering Detection
- The paper introduces MCCLD, a multi-task GNN framework that jointly optimizes laundering transaction and group detection in stablecoin ecosystems.
- It leverages a shared multi-level encoder with auxiliary group information, achieving significant improvements in F1 and AUC metrics over baseline models.
- The framework formalizes crowdsourcing laundering as binary classification tasks, demonstrating robust performance across diverse datasets and real-world scenarios.
Multi-Task Collaborative Crowdsourcing Laundering Detection (MCCLD) is a framework designed for the detection of novel, rapidly emerging money laundering patterns, particularly in stablecoin ecosystems such as USDT. Crowdsourcing laundering disperses illicit flows by recruiting large numbers of ordinary individuals, circumventing detection by leveraging the diversity of transaction and organizational patterns. MCCLD jointly optimizes laundering transaction identification and transaction group inference using a multi-task graph neural network, driven by multi-level feature fusion and auxiliary group information, and achieves state-of-the-art performance in both crowdsourcing and general laundering scenarios (Li et al., 2 Dec 2025).
1. Problem Landscape and Formalization
Crowdsourcing cryptocurrency laundering, also known as "running-points" or "black USDT sales," entails launderers recruiting numerous users ("motorcade" members) to conceal fund origins through many KYC-approved accounts. Organizationally, these patterns form polycentric, multi-gang, weakly linked structures, making canonical graph partitioning or simple subgraph-mining insufficient. Two main detection tasks are formalized:
- Laundering Transaction Detection: Binary classification of each transaction as laundering ($1$) or legitimate ($0$), with ground-truth .
- Transaction Group Detection: Binary identification of whether a transaction occurs within a constructed “transaction group”—i.e., between accounts in the same weakly-connected component of a delegation (staking) subgraph ().
Let the transaction graph be with node attributes , edge features , and tasks to find learning functions (laundering) and (groups), such that and .
2. MCCLD Model Architecture
MCCLD utilizes an end-to-end graph neural network (GNN) architecture with the following features:
- Shared Multi-Level Encoder (): GIN-style aggregation and MLP-based fusion of account- and transaction-level features, yielding edge embeddings .
- Shared Classifier (): Output matrix providing softmax scores for four classes: legit/laundering vs. inter-group/intra-group.
Feature Encoding and Propagation
- Account node features: .
- Transaction edge features: .
Each GNN block (layer) iteratively updates:
- Account embeddings (using GIN message-passing):
- Transaction features (bottleneck MLP):
- Fused transaction embedding:
Here, are two-layer MLPs with ReLU (from GIN); are two-layer MLPs with batch normalizing and ReLU. denotes vector concatenation. After layers ( in practice), output is .
Classification Objective
The classifier outputs, for each transaction :
3. Multi-Task Training Objective
Separate loss terms are combined to optimize both detection tasks:
- Laundering detection loss (weighted binary cross-entropy):
where ; for class balance.
- Group detection loss (binary cross-entropy):
- Joint objective:
with yielding favorable trade-offs on F1/AUC in the primary evaluations.
4. Incorporation of Auxiliary Information
Crowdsourcing laundering exploits network structures via delegation and staking. MCCLD integrates auxiliary group knowledge as follows:
- Transaction Group Construction: On blockchains such as TRON, native delegation (staking) subgraphs are extracted. Their weakly connected components define “account groups”; transactions wholly within a component are labeled intra-group.
- Subgraph Sampling: To surpass hardware constraints and boost generalization, training operates on induced subgraphs ($2$k–$5$k edges), using random seed edges with neighbor expansion to encompass $2$-hop neighborhoods.
Table: Auxiliary Group Construction Methods and Effects (Tron-USDT dataset)
| Method | F1 | AUC |
|---|---|---|
| No group | 0.853 | 0.971 |
| Native staking group | 0.950 | 0.980 |
| GMPA-generated group | 0.956 | 0.986 |
| Louvain group | 0.941 | 0.980 |
The use of high-fidelity group assignments confers the largest performance improvements.
5. Training and Dataset Overview
Key hyperparameters and stats:
- Model Hyperparameters: 2 GNN layers, 64-dim embeddings, learning rate , Adam (), batch size 4 subgraphs per step, weight decay , early stopping (patience=20).
- Datasets Evaluated:
- Tron-USDT: $1,043k$ accounts, $2.6M$ transactions, $252$ positives, $10,529$ negatives for laundering; $18,997$ intra-group edges.
- Harmony hack: $340k$ accounts, $1.8M$ txns.
- Upbit hack: $577k$ accounts, $2.3M$ txns.
- IBM-LI synthetic: $705k$ accounts, $7.0M$ txns.
Preprocessing follows established graph learning conventions, with KYC, temporal, and attribute-based node features included as available.
6. Empirical Performance and Analysis
Evaluation Metrics
- F1 Score:
- AUC (Area under ROC Curve)
Benchmarking and Baselines
Comparison methods include subgraph-based (GMPA, AntiBenford) and transaction-based GNNs (GCN, GAT, GraphSAGE, PNA).
Table: Results (General Laundering Datasets)
| Method | Upbit (F1/AUC) | Harmony (F1/AUC) | IBM-LI (F1/AUC) |
|---|---|---|---|
| PNA | 0.836/0.892 | 0.774/0.911 | 0.261/0.700 |
| GCN | 0.636/0.759 | 0.588/0.908 | 0.091/0.528 |
| MCCLD | 0.885/0.923 | 0.806/0.916 | 0.405/0.737 |
- Crowdsourcing Laundering (Tron-USDT): F1 , AUC for MCCLD; average F1 uplift over baselines, AUC .
- General Laundering: Average F1 gain for MCCLD.
Ablation and Robustness
- Exclusion of group info substantially impairs F1. Native staking groups and high-fidelity group assignments (GMPA, Louvain) result in strongest performance.
- Under label-scarcity, MCCLD's F1 degrades gracefully: labels yield , labels .
7. Implications, Limitations, and Prospects
MCCLD's efficacy arises from multi-level feature fusion, the enforcement of intra-group embedding consistency by the auxiliary task, and explicitly shared classifier/encoder designs. These elements are particularly suited for addressing the heterogeneity and polycentricity of crowdsourcing laundering groups.
Limitations include dependence on at least partial supervision or availability of high-quality “group” partitions, as well as the potential of subgraph sampling to miss long transaction chains.
Potential future directions:
- Dynamic GNN extensions to capture temporal laundering patterns.
- End-to-end learnable group assignment via contrastive objectives, mitigating the reliance on external detectors.
- Integration of cross-chain flows to generalize to multi-blockchain laundering operations (Li et al., 2 Dec 2025).
A plausible implication is that frameworks based on MCCLD may generalize to other domains exhibiting network-based polycentricity and fine-grained actor heterogeneity. The availability of public pseudocode and detailed methodology enables replication on modern GNN toolkits.