AuctionNet: Ad Auction Simulation Benchmark

Updated 5 December 2025

AuctionNet Simulation Benchmark is a comprehensive and scalable research platform for evaluating ad auction decision-making using realistic simulation and extensive datasets.
It integrates a full ad auction environment featuring latent diffusion for feature synthesis, cross-attention for value prediction, and multiple baseline bidding algorithms.
The platform employs rigorous evaluation protocols with metrics like mean episode reward, RMSE, and Pearson correlation to benchmark both RL and traditional approaches.

AuctionNet Simulation Benchmark is a comprehensive and scalable research platform for evaluating decision-making algorithms in large-scale ad auction environments. AuctionNet provides an open-source implementation of an ad auction simulator, a massive pre-generated dataset reflecting real-world distributions, and multiple baseline algorithms with reproducible evaluation protocols. It is directly anchored to empirical ad auctions but supports general multi-agent decision-making research. Its extensibility and fidelity make it a reference standard for simulation-based performance analysis and off-policy evaluation in both academic and industrial settings (Su et al., 2024, Yeom et al., 3 Dec 2025).

1. Architecture and Components

AuctionNet is structured as follows (Su et al., 2024):

Ad Auction Environment:
- Ad-opportunity generation: Employs a latent diffusion model (LDM) and VAE-style encoding/decoding for feature synthesis. This models realistic user/context distributions while protecting sensitive attributes.
- Value prediction: Stacked cross/self-attention blocks predict opportunity value, conditioned on latent features, category, and temporal embedding.
- Bidding module: Supports 48 agent types, including PID controllers, Offline RL agents (IQL), Behavior Cloning, Decision Transformers, and Online LP solvers.
- Auction module: Implements the Generalized Second-Price (GSP) mechanism with multi-slot extension and plug-in support for alternative rules.
Pre-generated Dataset:
- Contains 21 episodes ("days"), each with ≈500 K ad opportunities and 48 time steps, yielding over 500 million records.
- Each record encodes advertiser index, category, budget, bid, impression flag, cost, conversion, and more.
- Distributional properties verifiably match real ad logs in PCA overlap and long-tailed features, supporting robust statistical modeling.
Baseline Algorithms:
- Includes PID controller algorithms, Online LP (budget-constrained knapsack), Behavior Cloning, IQL (offline RL with expectile regression), and Decision Transformer.
- Performance is normalized against the “Abid” heuristic (uniform multiplier).
API and Integration:
- Python library with Gym-style interface: agent registration, environment stepping, agent observing.
- Flexible agent extension via subclassing BaseAgent.
- Datasets provided via MIT-licensed GitHub repository and downloadable after competition.

2. Mathematical Foundations

AuctionNet formalizes auction allocation, pricing, and optimization as follows (Su et al., 2024, Yeom et al., 3 Dec 2025):

Allocation and Pricing (GSP mechanism):
- For a set of bids $\{b_i\}$ , allocating slot to the highest $b_{(1)}$ , charging second-highest $b_{(2)}$ .
- Utility: $U_i = v_i x_i - p_i$ , $x_i \in \{0,1\}$ .
Multi-Slot Extension:
- Bidders index $i$ , slot index $j$ , exposure rates $e_{ij}\in[0,1]$ .
- Objective for agent $i$ :
$\max_{\{\alpha_i^t\}}\sum_{t=1}^T\sum_{j=1}^m e_{ij}^t x_{ij}^t v_{ij}^t$

subject to budget constraint

$\sum_{t,j}e_{ij}^t x_{ij}^t c_{ij}^t \leq \omega_i.$
Baseline Optimization and RL Objective:
- Linear Programming/Knapsack: maximize allocated value under budget.
- RL: $J(\theta)=\mathbb{E}_{\pi_\theta}[\sum_{t=1}^T r(s_t,a_t)]$ , updated by policy gradient.
- Generative models (VAE/diffusion) estimated via corresponding $L_{\mathrm{VAE}}, L_{\mathrm{LDM}}$ objectives.
Simulation Benchmarks:
- Mean episode reward is normalized to Abid=1.0.

3. Evaluation Protocols and Metrics

AuctionNet provides rigorous protocols for offline evaluation, policy comparison, and model selection (Su et al., 2024, Yeom et al., 3 Dec 2025):

Performance Metrics:
- Mean episode reward (relative to Abid).
- Constraint violations (CPA penalty).
- In OPE settings:
- Mean Directional Accuracy (MDA)
- Root Mean Square Error (RMSE)
- Pearson correlation $\rho$
Experimental Setup:
- 80/20 train-validation split for RL agents.
- PID controller hyperparameters: $\lambda_P=0.1$ , $\lambda_I=0.01$ , $\lambda_D=0.001$ .
- RL/baseline models trained for 100K gradient steps, batch size 256.
- Dataset is used in full for bid landscape modeling in OPE (no held-out split for reward modeling).
Key Results:
- Online LP method achieves normalized episode reward ≈1.3, followed by IQL ≈1.15, BC ≈1.1, DT ≈1.05.
- RL and transformer models have headroom for further tuning.
- In OPE experiments (Yeom et al., 3 Dec 2025), DPM-based SNIPS estimator yields MDA = 100%, RMSE = 4.87pp, with substantially lower error than parametric baselines.

Metrics Table

Metric	Definition	Application Context
Mean Reward	$\mathbb{E}[\sum_t r_t]$	Normalized to Abid baseline
MDA	Sign match on $\Delta$	OPE policy selection
RMSE	Abs error on lift	OPE, policy evaluation
Pearson $\rho$	Linear corr of lifts	OPE, estimator validation

4. Off-Policy Evaluation on AuctionNet

AuctionNet has enabled fundamental advances in reliable OPE for deterministic ad auctions (Yeom et al., 3 Dec 2025):

Challenge: Winner-take-all setting yields zero propensity for non-winning actions, causing standard IPS estimators to fail.
Bid Landscape Model: A Discrete Price Model (DPM) estimates unobserved market price $z$ by discretizing scores into quantile bins, computing densities and survival functions, and deriving instantaneous win probabilities $h_\ell$ as approximate propensity scores.
SNIPS Estimator: Self-normalized inverse propensity scoring with APS, stabilized by weight-capping, applied on logged deterministic auction data.
Empirical Findings: DPM-OPE matches online A/B test results with 92.9% MDA in CTR prediction, outperforming parametric OPE and maintaining scale-invariant error.

A plausible implication is that realistic simulators with empirical bid landscapes are necessary for credible OPE in deterministic mechanism design.

5. Extensibility and Integration

AuctionNet is designed for extensibility at several abstraction layers (Su et al., 2024, Kushnir et al., 2022):

Agent Modeling: Custom agents are defined by subclassing BaseAgent and implementing bidding/observation logic.
Auction Mechanisms: The core Python API allows for plug-in mechanism customization, multi-slot allocation, and exposure rate modeling.
Dataset Usage: Comprehensive schema allows statistical analysis, algorithm benchmarking, and supervised learning.
Benchmark Expansion: Scenarios from multi-dimensional auction simulation (auctionsim) can be systematically integrated by adding parameter configurations:
- $(N,J,\text{distribution},\text{cost},T)$ for each scenario.
- Metrics per scenario: optimal revenue, exclusive revenue, runtime, ICC/Border violations, exclusion region masks.
- Visualization of allocation and exclusion regions.
Reproducibility: Open-source codebase, full dataset access, documented API facilitate extension to new algorithms, auction formats, and statistical models.

AuctionNet subsumes and integrates simulation benchmarks for optimal multi-dimensional auction mechanisms (Kushnir et al., 2022):

auctionsim Library: Provides simulation experiments on optimal vs. exclusive-buyer mechanisms, supporting Uniform, Beta, TruncatedNormal, and Mixture distributions over types and grade values.
Mathematical Formulation: LP-based mechanism encoding incentive compatibility (ICC), Border constraints, and direct-revelation IR; supports revenue gap analysis, exclusion region characterization.
Benchmark Interface: Empirical scaling $O(T^J\log T^J)$ , instance runtimes $2-15$s, numerical accuracy confirmed by LP duality gap, ICC violation minimization.
Conjecture Validation: Supports systematic policy and mechanism comparison, providing empirical evidence for or against literature conjectures.

AuctionNet’s integration of these libraries enables a unified evaluation framework for both industrial bidding algorithms and theoretical mechanism simulations.

7. Research Applications and Directions

AuctionNet is applicable to the following areas:

Auction-based machine learning, RL-based auto-bidding, and bid landscape modeling.
Off-policy evaluation, model selection, and safe deployment in online advertising systems.
General large-scale game decision-making, multi-agent optimization, and POSG analysis.
Mechanism design research, including optimal allocation, revenue maximization, and incentive analysis.
Statistical benchmarking and generative modeling for synthetic data synthesis.
Comparative study of allocation mechanisms, including second-price, GSP, and custom auction rules.

The design and empirical protocols of AuctionNet position it as a central benchmark for empirical, theoretical, and methodological advances in auction environments and algorithmic decision-making.

Markdown Upgrade to Chat

References (3)

AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games (2024)

Breaking Determinism: Stochastic Modeling for Reliable Off-Policy Evaluation in Ad Auctions (2025)

Optimal Multi-Dimensional Auctions: Conjectures and Simulations (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AuctionNet Simulation Benchmark.