Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 56 tok/s

Gemini 2.5 Pro 38 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 124 tok/s Pro

Kimi K2 198 tok/s Pro

GPT OSS 120B 443 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Machine Learning-Based Cartel Detection

Updated 5 October 2025

Machine learning-based cartel detection algorithms are statistical and computational tools that combine econometric screens, supervised and unsupervised learning, and network approaches to identify collusive market behavior.
These methods deploy advanced algorithms such as random forests, SVMs, ensemble models, CNNs, and graph neural networks to capture bid anomalies and achieve high classification accuracy.
Key applications span wholesale electricity, railway procurement, and simulation settings, while challenges remain in handling incomplete cartels, model interpretability, and cross-jurisdiction transferability.

Machine learning-based cartel detection algorithms are a class of statistical and computational tools designed to identify forms of collusive behavior in markets, especially procurement auctions and wholesale spot markets. These algorithms leverage bid data, structural network features, and advanced supervised and unsupervised learning techniques—including deep learning and graph neural networks—to uncover patterns and associations consistent with bid rigging, capacity withholding, and other forms of illicit cooperation among firms. Recent literature exhibits rapid methodological expansion: from simple econometric screens to ensemble learning frameworks, convolutional and graph attention neural networks, and network science approaches. The following sections provide a systematic account of the concepts, methodologies, algorithms, empirical results, and applications underpinning modern machine learning-based cartel detection.

1. Statistical Screens and Preprocessing

Detecting cartels using machine learning begins with the construction of predictor variables ("screens") extracted from bid data. These measures capture statistical anomalies characteristic of collusion, such as reduced variance, abnormal bid gaps, and non-uniform bid distributions. Screens commonly used include:

Coefficient of Variation (CV): $CV_t = \frac{sd_t}{\bar{b}_t}$ —low in collusion due to bid coordination.
Spread (SPD): $(b_{max,t} - b_{1,t}) / b_{1,t}$ —often smaller in collusive episodes.
Difference Between Two Lowest Bids (DIFFP): $(b_{2,t} - b_{1,t}) / b_{1,t}$ .
Relative Distance (RD): $(b_{2,t} - b_{1,t}) / (sd_{losing\,bids, t})$ .
Normalized Relative Distance (RDNORM): $(b_{2,t} - b_{1,t}) / \left(\frac{1}{n_t-1}\sum_{i=1}^{n_t-1}(b_{i+1,t} - b_{i,t})\right)$ .
Kurtosis, Skewness, and Kolmogorov–Smirnov (KS) Statistic: Additional distributional statistics for flagging abnormality.

A core innovation is the calculation of these screens across all possible subgroups of three or four bids within a tender, followed by summary statistics (mean, median, max, min), which mitigate the dilution effects caused by incomplete cartels mixing collusive and competitive bids (Wallimann et al., 2020, Imhof et al., 2021). This methodology is robust to partial collusive arrangements and has become standard in recent empirical studies, including railway-infrastructure procurement (Wallimann et al., 2023), electricity markets (Proz et al., 13 Aug 2025), and educational data science settings (Wallimann et al., 26 Jan 2024).

2. Supervised and Unsupervised Machine Learning Algorithms

Having constructed predictor screens, cartel detection proceeds via application of supervised machine learning methods, notably:

Random Forests: Ensemble trees trained on bid screens, effective in managing feature interactions and non-linearities. Models often use 1,000 trees and cross-validation to optimize their classification accuracy (Wallimann et al., 2020, Wallimann et al., 2023).
Lasso Regression: Penalized logistic regression minimizing $L(\beta) = \sum_{i=1}^n[y_i-g(X_i)]^2 + \lambda\sum_j|\beta_j|$ , with regularization enhancing model sparsity and interpretability (Imhof et al., 2021).
Support Vector Machines (SVMs): Maximize margin between collusive and competitive screen vectors, accommodating non-linearity via kernel methods (Imhof et al., 2021).
Super Learner Ensembles: Weighted convex combinations of diverse learners (random forest, lasso, boosting, neural networks), achieving superior generalization (Imhof et al., 2021, Wallimann et al., 2023, Proz et al., 13 Aug 2025).

Performance is evaluated by Correct Classification Rate (CCR), F1 score, and ROC-AUC. CCRs in modern studies range from 80% to 95%, with ensemble methods frequently outperforming single algorithms.

Unsupervised elements (clustering, outlier detection) supplement supervised classifiers by flagging tenders in high-cohesion/high-exclusivity network regions (Wachs et al., 2019). Algorithms are calibrated via cross-validation and, in settings with sparse ground truth, synthetic cartel examples generated from null models (Wachs et al., 2019).

3. Network-Based Detection and Graph Approaches

Recent advances extend cartel detection beyond tender-level screens to the structure of bidder interactions. These include:

a. Co-bidding Networks and Topological Features

Firms are represented as nodes, with edges weighted by Jaccard similarity in bidding histories: $w_{A,B} = |c_A \cap c_B| / |c_A \cup c_B|$ (Wachs et al., 2019). Community detection using bottom-up greedy algorithms and group fitness functions ( $f_G = (s_{in}^G / (s_{in}^G + s_{out}^G)^\alpha) \cdot |G|^\beta$ ) yields overlapping, cohesive, and exclusive groups. Two topological features—coherence ( $C_G$ ) and exclusivity ( $E_G$ )—distinguish stable collusive environments where $C_G$ and $E_G$ are high.

b. Graph Representation Learning and Link Prediction

Graph embedding methods such as node2vec convert complex criminal or bidder networks into continuous vector spaces, preserving local and global topological signals. Edge embeddings (Hadamard, average, $L_1$ , $L_2$ ) can then be used in logistic regression or kNN frameworks for predicting missing links, classifying association types, and even regressing link weights (Lopes et al., 2022). Classification accuracy in static link prediction settings can reach 98%.

c. Deep Learning on Graphs: Graph Attention Neural Networks (GATs)

Recent work adopts GATs, which enable attention-based message passing accounting for both spatial (Jaccard) similarity and temporal proximity via Gaussian kernels (Imhof et al., 16 Jul 2025). Node representations ( $h_i$ ) are iteratively updated:

$h_i^{+} = \mathrm{ReLU}\left(\sum_{j \in N_i} a_{ij} h_j\right)$

where $a_{ij} = \mathrm{softmax}( J_{ij} \cdot \delta_{ij} )$ . These architectures yield accuracy rates of 84–91% across diverse markets and demonstrate improved transferability and discriminative power over traditional ensembles.

4. Deep Learning and Convolutional Neural Networks

Convolutional neural networks (CNNs) offer an alternative, firm-level view by transforming bidding interactions into images via min–max normalization and plotting firm-wise bid coordinates (Huber et al., 2021). Architectural features include up to three convolutional layers (successive $3\times3$ filters, ReLU activation, pooling) and dense hidden layers, culminating in sigmoid binary classification output.

Empirical studies report median accuracies of 91–92% in both Japanese and Swiss markets, indicating robust detection on image-based representations. However, transnational transfer (training on one country, testing on another) reduces accuracy due to institutional bid structure differences—a challenge for cross-jurisdictional application.

5. Reinforcement Learning and Algorithmic Collusion

Investigations into algorithmic collusion leverage multi-agent reinforcement learning (MARL) frameworks to model and dissect strategic evolution among autonomous pricing agents (Schlechtinger et al., 2021, Banerjee, 2023). Deep Q-Networks (DQN) are used for value iteration:

$Q^*(s,a) = \mathbb{E}_{s'}\left[ r + \max_{a'} Q^*(s',a') \mid s,a \right]$

Distinct learning stages emerge: initial random exploration, early specialization, and collusive equilibrium wherein agents coordinate implicitly or explicitly. The mechanism design approach to destabilizing cartel equilibria employs a "two stage price drop rule"—protecting the deviant with revenue top-ups ( $(\hat{p}_{i,t} + p_{i,t}) q_{i,t} = p_{i,\tau_1} q_{i,\tau_1}$ ), rendering punishment ineffective and forcing reversion to Nash pricing (Banerjee, 2023). MARL experiments confirm substantial reductions in cartel-induced markups without welfare losses on the competitive path.

6. Extensions to Specific Sectors and Data Modalities

Machine learning cartel detection is increasingly deployed in specialized markets:

Wholesale Electricity: Ensemble models incorporating both classical price screens and novel capacity-withholding screens (number of offers, total quantity) sharply improve accuracy in identifying collusion and capacity withholding, with CCRs frequently exceeding 95% in complete cartel settings (Proz et al., 13 Aug 2025).
Railway-Infrastructure Procurement: A dual approach combines centralized screening via ensemble methods (random forest, super learner) with decentralized tools (simple pruned trees using the CV), enabling effective real-time flagging by category managers (Wallimann et al., 2023).
Classroom and Simulation Settings: Pedagogical implementations use data from documented cartels and machine learning classifiers (random forest) to demonstrate the trade-off between false positives and detection efficiency, reinforcing the link between economic theory and data science (Wallimann et al., 26 Jan 2024).

7. Challenges, Limitations, and Future Directions

Several methodological and practical challenges persist:

Handling Incomplete Cartels: Dilution of statistical signals by competitive bids necessitates subgroup-based screen computation and robust model validation (Wallimann et al., 2020, Imhof et al., 2021).
Class Imbalance and Signal Contamination: GATs and deep learning methods are sensitive to cases where collusion is rare or competitive bids predominate (Imhof et al., 16 Jul 2025).
Model Transparency and Interpretability: The "black box" nature of deep learning and reinforcement learning complicates regulatory use and legal proceedings (Dorner, 2021).
Transferability Across Jurisdictions: Institutional differences limit application of models trained on one region to another without retraining or adaptation (Huber et al., 2021, Imhof et al., 16 Jul 2025).

The literature suggests that integration of network-derived features, message-passed screens, and context-aware models will be critical in advancing cartel detection. Proactive enforcement will benefit from combining statistical, supervised, and unsupervised learning methods with sector-specific adaptations and careful calibration of model thresholds. Ongoing research continues to refine hybrid frameworks, temporal modeling, and feature engineering to improve robustness against strategic bidding behaviors and evolving forms of collusion.