Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 102 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 30 tok/s

GPT-5 High 27 tok/s Pro

GPT-4o 110 tok/s

GPT OSS 120B 475 tok/s Pro

Kimi K2 203 tok/s Pro

2000 character limit reached

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation (2409.19824v1)

Published 29 Sep 2024 in cs.IR and cs.AI

Abstract: We propose a domain-adapted reward model that works alongside an Offline A/B testing system for evaluating ranking models. This approach effectively measures reward for ranking model changes in large-scale Ads recommender systems, where model-free methods like IPS are not feasible. Our experiments demonstrate that the proposed technique outperforms both the vanilla IPS method and approaches using non-generalized reward models.

Collections

Summary

The paper introduces a domain-adapted reward model for robust counterfactual evaluation of ads ranking models, addressing limitations of traditional methods like IPS in complex scenarios.
The method trains the reward model using a weighted loss function that emphasizes differences between ranking policies, allowing it to generalize and estimate lift accurately across various policy domains.
Experimental results on synthetic and real-world data demonstrate that the proposed domain-adapted model significantly outperforms baseline methods and vanilla IPS in evaluating ranking policies.

This paper introduces a domain-adapted reward model to enhance counterfactual evaluation of ads ranking models, particularly in scenarios where traditional model-free methods like IPS are impractical (2409.19824). The core innovation lies in training a reward model that generalizes across different ranking policies, facilitating accurate lift estimation within an offline A/B testing framework.

Domain-Adapted Reward Model and Offline A/B Testing System

The paper addresses the problem of selection bias inherent in large-scale recommender systems by proposing a domain-adapted reward model, $h(x, a)$ , that estimates the reward y given context x and ad a. This reward model is trained to function effectively across multiple domains, where each domain represents a specific ranking policy. The methodology leverages an offline A/B testing system, which simulates ad recommendations for each target domain using historical data. Each target domain, denoted as $T_k$ , represents ads recommended by a specific ranking model.

Lift Estimation

The reward model facilitates the estimation of lift between a target domain ( $T_k$ ) and a source domain ( $S$ ). Lift is quantified as the difference in expected reward, as estimated by the reward model, between the target and source domains. This is a critical step in assessing the impact of transitioning from one ranking policy to another.

Weighted Loss Function

Training the reward model involves minimizing a weighted loss function on labeled data from the source domain ( $D_S$ ). The weighting scheme is designed to emphasize non-overlapping regions between target and source domains, thereby improving the model's ability to generalize across different policies. The weight, $w^k_a$ , is defined as the ratio of the probability of observing ad a under context x with target policy $T_k$ to the probability under source policy $S$ . The loss function incorporates two key terms:

$|w^k_a - 1|$ : This term focuses on the discrepancies between target and source domains, ensuring that the reward model is sensitive to policy changes.
$β Σ |w^k_a - w^{k'}\_a|$ : This term, regulated by the hyperparameter β, minimizes the deviation in reward model performance across different target domains. It promotes consistent performance of the reward model across all domains.

Implementation and Evaluation

The implementation integrates the domain-adapted reward model into an offline A/B testing system, allowing for a structured evaluation of different ranking policies. The process involves simulating ad recommendations for each target domain, predicting rewards using the trained reward model, calculating lift between target and source domains, and ranking policies based on their estimated lifts.

Experimental Results

The paper substantiates its claims with experimental results derived from both synthetic and real-world data.

Synthetic Data: In a controlled synthetic environment, the proposed reward model demonstrated superior performance compared to a baseline model (trained solely on source domain data) and the vanilla IPS method. The performance metric used was $Rec\_cv$ (coefficient of variation of recovery), which measures the accuracy of recovery.
Online Experiment (CTR Prediction): Using data from a completed A/B test for a CTR prediction model, the proposed reward model achieved a 17.6% improvement on the $Rec\_cv$ metric compared to a baseline model. The propensity score weights were estimated by training an impression probability estimator per target domain due to the intractability of directly calculating the propensity score weight in complex recommendation systems.

In summary, the paper introduces a domain-adapted reward model for counterfactual evaluation of ads ranking models, particularly in scenarios where traditional model-free methods like IPS are not feasible. The reward model is trained using a weighted loss function that emphasizes the differences between the current (source) policy and the new (target) policies. Experimental results using both synthetic and real-world data demonstrate that the proposed reward model outperforms both a baseline model and the vanilla IPS method.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation (2409.19824v1)

Collections

Summary

Domain-Adapted Reward Model and Offline A/B Testing System

Lift Estimation

Weighted Loss Function

Implementation and Evaluation

Experimental Results

Paper Prompts

Follow-up Questions

Authors (7)

Don't miss out on important new AI/ML research

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation (2409.19824v1)

Collections

Summary

Domain-Adapted Reward Model and Offline A/B Testing System

Lift Estimation

Weighted Loss Function

Implementation and Evaluation

Experimental Results

Paper Prompts

Follow-up Questions

Related Papers

Authors (7)

Don't miss out on important new AI/ML research