Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quadratic Interest Network for Multimodal Click-Through Rate Prediction

Published 24 Apr 2025 in cs.IR | (2504.17699v2)

Abstract: Multimodal click-through rate (CTR) prediction is a key technique in industrial recommender systems. It leverages heterogeneous modalities such as text, images, and behavioral logs to capture high-order feature interactions between users and items, thereby enhancing the system's understanding of user interests and its ability to predict click behavior. The primary challenge in this field lies in effectively utilizing the rich semantic information from multiple modalities while satisfying the low-latency requirements of online inference in real-world applications. To foster progress in this area, the Multimodal CTR Prediction Challenge Track of the WWW 2025 EReL@MIR Workshop formulates the problem into two tasks: (1) Task 1 of Multimodal Item Embedding: this task aims to explore multimodal information extraction and item representation learning methods that enhance recommendation tasks; and (2) Task 2 of Multimodal CTR Prediction: this task aims to explore what multimodal recommendation model can effectively leverage multimodal embedding features and achieve better performance. In this paper, we propose a novel model for Task 2, named Quadratic Interest Network (QIN) for Multimodal CTR Prediction. Specifically, QIN employs adaptive sparse target attention to extract multimodal user behavior features, and leverages Quadratic Neural Networks to capture high-order feature interactions. As a result, QIN achieved an AUC of 0.9798 on the leaderboard and ranked second in the competition. The model code, training logs, hyperparameter configurations, and checkpoints are available at https://github.com/salmon1802/QIN.

Summary

  • The paper introduces the Quadratic Interest Network (QIN), utilizing adaptive sparse attention and quadratic networks to effectively model multimodal data interactions for CTR prediction.
  • Achieved an impressive AUC of 0.9798 and second place in a competition, showing a significant 0.1046 AUC improvement over the DIN baseline.
  • Highlights QIN's potential to enhance industrial recommender systems by integrating multimodal data and emphasizes modeling complex interactions for better predictions.

Quadratic Interest Network for Multimodal Click-Through Rate Prediction

The paper "Quadratic Interest Network for Multimodal Click-Through Rate Prediction" presents a sophisticated approach to enhancing click-through rate (CTR) predictions in industrial recommender systems through the integration of multimodal data using the Quadratic Interest Network (QIN). This study is anchored within the context of the Multimodal CTR Prediction Challenge Track of the WWW 2025 EReL@MIR Workshop, emphasizing the importance of utilizing diverse data modalities to improve recommendation systems' predictive capabilities.

Introduction and Background

CTR prediction is a fundamental component of recommender systems, designed to estimate the likelihood of users clicking on suggested items. Traditional models in this domain primarily utilize log-based information, including user profiles and contextual attributes. However, this paper identifies the growing trend towards incorporating multimodal data—such as text, images, and behavioral logs—to capture complex user-item interactions more accurately. The challenge lies in effectively processing and integrating these diverse data types without compromising the latency requirements crucial for real-time applications.

Proposed Methodology: Quadratic Interest Network

The highlight of this study is the introduction of the Quadratic Interest Network (QIN), engineered to address the complexities associated with multimodal CTR prediction. QIN employs two core components:

  1. Adaptive Sparse Target Attention (ASTA): ASTA enhances the extraction of user behavior features by dynamically focusing on the most informative parts of users’ interaction histories. This component reduces computational overhead by replacing the SoftMax normalization mechanism with ReLU, leading to non-normalized hard attention outcomes. This adjustment is particularly beneficial in real-time prediction settings where latency and relevance are prime concerns.
  2. Quadratic Neural Networks (QNN): QNN is employed to model high-order interactions among features, which are pivotal in CTR prediction contexts. By leveraging quadratic terms as opposed to traditional linear weighting mechanisms, QNN captures complex interdependencies between various user and item characteristics, thereby facilitating more nuanced and expressive representations.

Results and Performance

The Quadratic Interest Network demonstrated an impressive Area Under the ROC Curve (AUC) score of 0.9798, attaining second place in the mentioned competition. This indicates a substantial improvement over conventional models, notably the Deep Interest Network (DIN), which served as the baseline, with QIN showing an increased effectiveness by 0.1046 in the validation set AUC score.

Implications and Future Directions

The implications of this research are multifaceted. Practically, QIN's advancements showcase its potential to significantly elevate the performance of industrial recommender systems by harnessing the richness of multimodal data. Theoretically, the findings underscore the importance of modeling complex interactions in CTR prediction tasks, paving the way for further exploration into adaptive neural architectures capable of handling multimodal inputs efficiently.

For future developments, the study suggests that continued refinement of sparse attention mechanisms and quadratic interaction models could lead to even more powerful predictive tools. Further investigation into optimizing the trade-off between computational efficiency and predictive accuracy in real-time systems remains a critical pathway for evolving CTR prediction technologies.

In conclusion, the Quadratic Interest Network stands as a noteworthy advancement in CTR modeling techniques, contributing valuable insights into the effective integration of multimodal data within neural architectures. As the field progresses, such frameworks are anticipated to further transform the landscape of recommendation systems across various digital platforms.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.