Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
136 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
50 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction (2404.02249v2)

Published 2 Apr 2024 in cs.IR, cs.AI, cs.LG, and cs.SI

Abstract: Predicting click-through rates (CTR) is a fundamental task for Web applications, where a key issue is to devise effective models for feature interactions. Current methodologies predominantly concentrate on modeling feature interactions within an individual sample, while overlooking the potential cross-sample relationships that can serve as a reference context to enhance the prediction. To make up for such deficiency, this paper develops a Retrieval-Augmented Transformer (RAT), aiming to acquire fine-grained feature interactions within and across samples. By retrieving similar samples, we construct augmented input for each target sample. We then build Transformer layers with cascaded attention to capture both intra- and cross-sample feature interactions, facilitating comprehensive reasoning for improved CTR prediction while retaining efficiency. Extensive experiments on real-world datasets substantiate the effectiveness of RAT and suggest its advantage in long-tail scenarios. The code has been open-sourced at \url{https://github.com/YushenLi807/WWW24-RAT}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Re-Imagen: Retrieval-Augmented Text-to-Image Generator. In ICLR.
  2. Wide & Deep Learning for Recommender Systems. In RecSys.
  3. Learning Enhanced Representations for Tabular Data via Neighborhood Propagation. In NeurIPS.
  4. DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction. In IJCAI.
  5. REALM: Retrieval-Augmented Language Model Pre-Training. In ICML.
  6. FiBiNET: Combining Feature Importance and Bilinear Feature Interaction for Click-Through Rate Prediction. In RecSys.
  7. Revisiting the Tag Relevance Prediction Problem. In SIGIR.
  8. Architecture and Operation Adaptive Network for Online Recommendations. In KDD.
  9. ReFer: Retrieval-Enhanced Vertical Federated Recommendation for Full Set User Benefit. In SIGIR.
  10. Interpretable Click-Through Rate Prediction Through Hierarchical Attention. In WSDM.
  11. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In KDD.
  12. Retrieval Augmented Classification for Long-Tail Visual Recognition. In CVPR.
  13. FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction. In AAAI.
  14. Retrieval & Interaction Machine for Tabular Data Prediction. In KDD.
  15. Fast Context-Aware Recommendations with Factorization Machines. In SIGIR.
  16. Okapi at TREC-3. NIST SP (1995).
  17. MISSRec: Pre-Training and Transferring Multi-Modal Interest-Aware Sequence Representation for Recommendation. In MM.
  18. DCN v2: Improved Deep & Cross Network and Practical Lessons for Web-Scale Learning to Rank Systems. In WWW.
  19. Dense Representation Learning and Retrieval for Tabular Data Prediction. In KDD.
  20. BARS: Towards Open Benchmarking for Recommender Systems. In SIGIR.
  21. Open Benchmarking for Click-Through Rate Prediction. In CIKM.
Citations (2)

Summary

  • The paper introduces a novel approach using retrieval-augmented learning to enhance CTR prediction by integrating both intra- and cross-sample feature interactions.
  • It employs a Transformer-based architecture with cascaded attention mechanisms and BM25-driven sample retrieval to enrich input representations.
  • Experimental results demonstrate RAT's superior performance, particularly in long-tail data scenarios, effectively addressing feature sparsity and cold start challenges.

RAT: Enhancing Click-Through Rate Prediction with Retrieval-Augmented Transformer

Introduction to Click-Through Rate Prediction

In the field of web applications, particularly in advertising and recommender systems, accurately predicting the Click-Through Rate (CTR) of ads or recommended items is critical. Traditional CTR prediction models have focused primarily on modeling interactions between features of individual samples, neglecting the potential benefits of considering cross-sample interactions. This overlooks the valuable contextual information that could be derived from the relationships between samples, which is especially crucial given the sparsity of feature interactions in many scenarios. This paper introduces a novel approach, the Retrieval-Augmented Transformer (RAT), which aims to capture both intra-sample and cross-sample feature interactions more effectively by augmenting the input with similar samples retrieved from a reference pool.

Retrieval-Augmented Learning

RAT leverages retrieval-augmented learning, a technique that has shown promise in integrating external contextual information to enhance model predictions in various domains such as natural language processing and computer vision. The methodology involves retrieving similar samples from a historical log and augmenting the target sample's input with these samples to provide additional context. This approach is designed to address the shortcomings of traditional CTR models, particularly their limitations in dealing with feature sparsity and the cold start problem.

The RAT Framework

The RAT model is explicitly designed to enhance fine-grained feature interactions within and between samples. It operates by first retrieving similar samples using a sparse retrieval algorithm and then employing a Transformer-based architecture to process the augmented input. This process involves several key steps:

  • Similar Samples Retrieval: For each target sample, RAT retrieves a set of similar samples from a predefined pool using the BM25 algorithm. These samples serve as augmented input, providing crucial external context.
  • Augmented Input Construction: RAT transforms discrete features into embedding vectors, including a unique embedding for the sample's label. The embeddings of the retrieved samples are then concatenated with the target sample's embeddings, forming the retrieval-augmented input.
  • Modeling Feature Interactions: To effectively capture both intra- and cross-sample interactions, RAT employs a Transformer architecture with cascaded attention mechanisms. This design choice benefits from the Transformer's capacity for modeling complex dependencies while maintaining computational efficiency.

Experimental Results

Extensive experiments on real-world datasets demonstrate RAT's superior performance over both traditional CTR models and earlier retrieval-augmented approaches. Significantly, RAT shows notable improvement in scenarios involving long-tail data, indicating its effectiveness in addressing challenges related to feature sparsity and cold start issues. These results are backed by rigorous comparisons and ablation studies that highlight the advantages of RAT's methodological choices, including its specific attention mechanisms and the overall architecture.

Future Directions

The promising outcomes of incorporating retrieval-augmented learning into CTR prediction models like RAT open avenues for further research. Future developments could explore more sophisticated retrieval mechanisms and investigate the integration of RAT with other types of data beyond tabular forms. Additionally, understanding how RAT's framework can be adapted or extended to other tasks in recommender systems and advertising presents an intriguing line of inquiry.

Conclusion

By leveraging retrieval-augmented learning and Transformer architectures, the RAT model represents a significant step forward in CTR prediction. It successfully addresses key challenges in the field by enhancing the model's ability to capture comprehensive feature interactions, both within and across samples. RAT's strong empirical performance, particularly in handling long-tail data, indicates its potential to contribute meaningfully to advancements in advertising technology and recommender systems.

X Twitter Logo Streamline Icon: https://streamlinehq.com