ML-Based Cartel Detection Algorithm
- Machine learning-based cartel detection algorithms are computational methods that convert bid data into structured features to identify collusive behavior in markets.
- They combine game-theoretic foundations, network analytics, and deep learning to achieve up to 91% accuracy in detecting cartel activities.
- The techniques are applied across various sectors, including public procurement and electricity, offering early warning signals for regulatory interventions.
Machine learning-based cartel detection algorithms are computational methods that utilize statistical, network, deep learning, and graph-based techniques to identify collusive behavior among firms—most notably in public procurement, commodity, and electricity markets. Central to their design is the transformation of transactional or offer data into structured features (screens), networks, or images, upon which advanced supervised or unsupervised models are built to distinguish collusive (cartel) outcomes from competitive ones. These algorithms address the challenge of hidden or incomplete collusion, leverage high-dimensional market heterogeneity, and extend to dynamic pricing and reinforcement learning contexts. The following sections provide a technical synthesis of the theoretical foundations, algorithmic frameworks, feature engineering, evaluation strategies, domain-specific adaptations, and recent innovations.
1. Theoretical and Game-Theoretic Foundations
Cartel detection algorithms often draw upon models of repeated games, agent-based simulations, and adaptive learning dynamics. A notable example is the modified trust game, in which agents simultaneously act as buyers and sellers with a “value for money” variable (Peixoto et al., 2012). Sellers’ payoffs depend on both the number of buyers and their own , introducing a payoff gradient for lowering to maximize profit but risking buyer defection. Two main update rules govern the system: buyers switch to higher sellers (“voting with their feet”), and sellers copy the of more successful peers. The system undergoes a phase transition controlled by the strategy update rate ; for the fixed point destabilizes, giving rise to a self-organized cartel where low proliferates without explicit collusion.
Spectral analysis and linear stability arguments reveal power-law temporal correlations and unpredictable cycle dynamics at criticality. These signatures inform machine learning algorithms by offering candidate early warning features: spikes in variance, emergent low states, 1/f-type noise, and network degree distributions . Such models highlight that cartel-like states may arise endogenously from decentralized agent adaptation, challenging detection systems to capture emergent, non-explicit collusion.
2. Network and Graph-Based Detection Approaches
Network-based methods construct graphs where nodes represent firms, tenders, or accounts, and edges encode transactional, co-bidding, or relational similarity (Wachs et al., 2019, Liang et al., 2020, Imhof et al., 16 Jul 2025). In auction markets, co-bidding networks are projected from firm-contract bipartite graphs, with edge weights calculated using Jaccard similarity . Overlapping community detection algorithms identify groups with high cohesion (geometric-to-arithmetic mean ratio of edge weights) and exclusivity (fraction of internal to total interaction).
Recent advances incorporate graph neural networks (GNNs), including Graph Attention Networks (GATs) (Imhof et al., 16 Jul 2025). These employ fixed domain-specific attention coefficients constructed from bidder set overlap and temporal proximity:
- (bidder similarity)
- (temporal Gaussian kernel)
- Composed as , normalized with softmax over neighbors.
Node-level features (statistical screens) are linearly projected, passed through attention-weighted aggregations, and then mapped to output predictions via ReLU and dense layers. These methods excel in markets with overlapping bidders and temporal clustering. Performance benchmarks indicate cross-market accuracy of up to 91%, robust to transfer learning across market types (e.g., Swiss, Japanese, Scandinavian) and outperforming ensembles of classical learners by 15–20 percentage points.
GNN-based models effectively propagate subtle collusive signatures across interconnected tenders, especially when individual screens are noisy or incomplete.
3. Statistical Screens and Feature Engineering
Statistical screens are descriptive measures calculated from bid distributions within tenders; their construction is foundational to supervised cartel detection algorithms (Wallimann et al., 2020, Imhof et al., 2021, Wallimann et al., 2023, Wallimann et al., 26 Jan 2024, Proz et al., 13 Aug 2025). Screens fall into several categories:
- Variance screens: Coefficient of Variation (), spread.
- Asymmetry screens: Skewness, percentage difference between the two lowest bids (), relative distance.
- Uniformity screens: Kolmogorov–Smirnov statistic (KS) comparing bid distributions against uniformity.
Methodologically, screens are calculated both globally (across all bids in a tender) and for all possible subgroups of three or four bids. Subgroup aggregation (mean, median, min, max) is critical for detecting incomplete cartels, as competitive bids can mask the signals of collusion. For example,
where is the number of subgroups and are subgroup statistics.
In electricity markets, novel screens are developed (total offers, total quantity, accepted offers/quantity) to capture capacity withholding—a behavior specific to sectors with strategic supply restrictions (Proz et al., 13 Aug 2025).
4. Machine Learning Models and Ensemble Methods
Cartel detection uses diverse supervised learning algorithms trained on screen-derived features:
- Random Forests: Ensemble of decision trees, bootstrapped on subsamples, reporting predictor importance (e.g., min/medians of , spread, KS).
- Lasso Logistic Regression: -penalized logit supporting feature selection.
- SVM: Large margin classifiers separating collusive/competitive instances.
- Super Learner Ensembles: Weighted averages over base learners (random forest, bagged trees, lasso, neural nets), optimized for out-of-sample accuracy.
Ensemble strategies dominate in benchmark studies, offering correct classification rates around 90% in procurement and electricity markets (Imhof et al., 2021, Proz et al., 13 Aug 2025). Models incorporating subgroup summaries and expanded screen sets maintain accuracy even under substantial bid or market structure heterogeneity.
In auction-based detection, the settings of decision thresholds (e.g., 0.5 or 0.7 probability of collusion) modulate trade-offs between false positives and negatives. Model calibration is performed via repeated holdout splits or cross-validation.
5. Deep Learning and Convolutional Models
Deep learning approaches, notably Convolutional Neural Networks (CNNs), introduce representation learning for image-like inputs (Huber et al., 2021). In bid-rigging detection, normalized bid values are pairwise plotted—the reference firm on one axis, competitors on the other—yielding bid interaction graphs. These graphs, reflecting spatial collusive patterns (empty regions, clustered points), serve as inputs for CNNs: layers with multiple filters, pooling, and fully connected blocks culminate in binary output for collusion classification.
Reported accuracies reach ≈91% within-country and ≈85–90% transnationally (Japanese/Swiss data). However, cross-domain transferability is constrained by procurement idiosyncrasies and institutional heterogeneity; larger datasets and further architectural refinement are cited as future needs.
6. Adaptations for Market-Specific Dynamics and Practical Deployment
Machine learning-based cartel detection algorithms are extensively adapted for market-specific contexts:
- Electricity Markets: Added capacity-withholding screens, with performance differing between complete and incomplete cartels (Proz et al., 13 Aug 2025).
- Railway Procurement: Hybrid centralized–decentralized screening, with both automated classifiers and interpretable manager tools (single-variable decision trees, e.g., CV < 0.053 for suspicion) (Wallimann et al., 2023).
- Classroom Simulations: Teaching frameworks use synthetic data, historical cartel cases, and random forests to simulate the detection–investigation pipeline, enabling practical, hands-on interpretational experience (Wallimann et al., 26 Jan 2024).
Some studies advance mechanism design approaches, implementing interventions (off-path top-ups to cheating sellers) within multi-agent reinforcement learning environments; these prevent price wars and destabilize cartels, restoring Nash equilibrium and reducing supra-competitive markups without direct welfare losses (Banerjee, 2023).
7. Evaluation, Limitations, and Directions
Evaluation is performed using out-of-sample accuracy, recall, specificity, F1, and AUC-ROC metrics, with careful attention to class imbalance, market idiosyncrasies, and the masking effects of competitive or incomplete bids. Limitations include sample size (risk of overfitting), model transferability across regulatory regimes, and the challenge of distinguishing tacit collusion from competitive adaptation—especially in dynamic and algorithmic pricing settings (Dorner, 2021).
A further research agenda calls for:
- Enhanced interpretability for legal and regulatory acceptance,
- Integration with time-series analyzers for high-frequency, adaptive markets,
- Graphical and network-based anomaly detection (including centralized hub–spoke critiques),
- Robustness across auction formats, market sectors, and international settings.
By leveraging rich statistical, network, and deep learning representations, machine learning-based cartel detection algorithms provide powerful foundations for proactive market surveillance, regulatory enforcement, and academic investigation into the dynamic architecture of collusion.