Delayed Feedback Modeling with Influence Functions (2502.01669v2)

Published 1 Feb 2025 in cs.LG, cs.AI, and cs.IR

Abstract: In online advertising under the cost-per-conversion (CPA) model, accurate conversion rate (CVR) prediction is crucial. A major challenge is delayed feedback, where conversions may occur long after user interactions, leading to incomplete recent data and biased model training. Existing solutions partially mitigate this issue but often rely on auxiliary models, making them computationally inefficient and less adaptive to user interest shifts. We propose IF-DFM, an \underline{I}nfluence \underline{F}unction-empowered for \underline{D}elayed \underline{F}eedback \underline{M}odeling which estimates the impact of newly arrived and delayed conversions on model parameters, enabling efficient updates without full retraining. By reformulating the inverse Hessian-vector product as an optimization problem, IF-DFM achieves a favorable trade-off between scalability and effectiveness. Experiments on benchmark datasets show that IF-DFM outperforms prior methods in both accuracy and adaptability.

Summary

The paper introduces IF-DFM, a framework that uses influence functions to adjust CVR models for delayed conversions.
It reformulates the inverse Hessian-vector product as an optimization problem, enabling scalable and efficient parameter updates.
Experimental results on Criteo and Taobao datasets show improved AUC, PRAUC, and Log Loss compared to traditional methods.

Delayed Feedback Modeling with Influence Functions

The discussed paper explores the challenge of delayed feedback in conversion rate (CVR) prediction models within online advertising. In the cost-per-conversion (CPA) model, predicting CVR accurately is pivotal for optimizing revenue, as advertisers are charged only for conversions post user interactions. Due to delayed feedback, conversions can occur well after initial user clicks, leading to incomplete data and bias in model training.

Proposed Framework

The paper introduces the Influence Function-empowered Delayed Feedback Modeling (IF-DFM), a method designed to model the impact of delayed conversions on CVR predictions. The framework leverages influence functions to estimate the effects of new and delayed conversions on model parameters, facilitating efficient updates without the need for full model retraining.

Figure 1: The framework of offline CVR methods, online CVR methods, and IF-DFM.

The framework compares offline CVR methods, online methods, and the proposed IF-DFM approach. Offline methods rely on static historical data and often do not adapt well to dynamic shifts in user interests. In contrast, online methods partially update models based on observed data, which can be repetitive and inefficient.

Influence Functions and Delayed Feedback

Influence functions, traditionally used in robust statistics, estimate the impact of data perturbations on model parameters, providing a mathematically enriched way to adjust models for newly arrived feedback without retraining. IF-DFM reformulates the inverse Hessian-vector product, essential to influence function calculations, as an optimization problem to ensure scalability and effectiveness.

Figure 2: An illustration of the delayed feedback problem in CVR tasks.

Methodology

IF-DFM addresses two key issues: label reversal and integration of new data. Label reversal occurs when samples initially labeled as negative eventually convert, necessitating corrections. Newly arrived data, indicative of recent user behavior, needs to be integrated efficiently to keep the models adaptive.

The paper employs a finite-sum quadratic optimization problem to address these perturbations. This approach allows using stochastic optimization techniques like SGD and its variants to efficiently compute parameter changes, bypassing the need for full retraining.

Experimental Results

The experimental evaluation was conducted on Criteo and Taobao datasets, demonstrating the superior performance of IF-DFM compared to existing methods in both offline and online settings. The results show consistent outperformance in metrics such as AUC, PRAUC, and Log Loss.

Figure 3: Offline experimental results on Taobao dataset.

Figure 4: Online experimental results on Criteo dataset.

The approach showed notable improvements in adaptability to dynamic user preferences, thanks to its efficient integration of influence functions for real-time feedback modeling.

Conclusion

IF-DFM offers a robust approach to mitigating delayed feedback in CVR prediction by efficiently incorporating influence functions. By directly estimating the impact of new and corrected data, this framework ensures timely updates to model parameters without the computational overhead of full retraining, displaying both efficacy in performance metrics and adaptability in dynamic advertising environments. Future developments may focus on enhancing influence estimation processes and deploying these methods in real-world applications via A/B testing.