Papers
Topics
Authors
Recent
Search
2000 character limit reached

AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks

Published 29 Oct 2018 in cs.IR, cs.AI, and cs.LG | (1810.11921v2)

Abstract: Click-through rate (CTR) prediction, which aims to predict the probability of a user clicking on an ad or an item, is critical to many online applications such as online advertising and recommender systems. The problem is very challenging since (1) the input features (e.g., the user id, user age, item id, item category) are usually sparse and high-dimensional, and (2) an effective prediction relies on high-order combinatorial features (\textit{a.k.a.} cross features), which are very time-consuming to hand-craft by domain experts and are impossible to be enumerated. Therefore, there have been efforts in finding low-dimensional representations of the sparse and high-dimensional raw features and their meaningful combinations. In this paper, we propose an effective and efficient method called the \emph{AutoInt} to automatically learn the high-order feature interactions of input features. Our proposed algorithm is very general, which can be applied to both numerical and categorical input features. Specifically, we map both the numerical and categorical features into the same low-dimensional space. Afterwards, a multi-head self-attentive neural network with residual connections is proposed to explicitly model the feature interactions in the low-dimensional space. With different layers of the multi-head self-attentive neural networks, different orders of feature combinations of input features can be modeled. The whole model can be efficiently fit on large-scale raw data in an end-to-end fashion. Experimental results on four real-world datasets show that our proposed approach not only outperforms existing state-of-the-art approaches for prediction but also offers good explainability. Code is available at: \url{https://github.com/DeepGraphLearning/RecommenderSystems}.

Citations (750)

Summary

  • The paper presents AutoInt, a novel approach that explicitly learns high-order feature interactions using a self-attention mechanism to improve CTR prediction.
  • It leverages multi-head self-attention and residual connections to efficiently model complex feature combinations from vast, sparse datasets.
  • Empirical evaluations on datasets like Criteo and Avazu show that AutoInt outperforms traditional models with higher AUC scores and lower Logloss.

AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks

The paper "AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks" presents a novel method for improving click-through rate (CTR) prediction through the explicit learning of high-order feature interactions. Unlike traditional models that typically rely on either manually hand-crafted features or implicit learning mechanisms through deep neural networks, AutoInt leverages the self-attention mechanism to capture meaningful feature combinations in an explicit and end-to-end manner.

Problem Definition

The central challenge in CTR prediction lies in the sparsity and high dimensionality of input features, which encompass user profiles and item attributes. Effective CTR prediction requires the identification of high-order combinatorial features, referred to as cross features, which traditionally require intensive manual feature engineering. The paper proposes AutoInt as a solution to automatically discover and model these interactions efficiently.

Methodology

AutoInt employs a multi-head self-attentive neural network to explicitly model feature interactions:

  1. Embedding Layer: Both numerical and categorical features are projected into a shared low-dimensional space.
  2. Interacting Layer: This core component uses the multi-head self-attention mechanism to enable each feature to interact with all others. Different attention heads capture distinct types of feature interactions by projecting features into multiple subspaces.
  3. Residual Connections: Incorporating residual connections helps in modeling high-order interactions and prevents the vanishing gradient problem, allowing stacking of multiple interacting layers.
  4. Output Layer: The aggregated results from the interacting layers are then used for final CTR prediction through a logistic function.

Experimental Setup

The paper evaluates AutoInt on four large-scale real-world datasets: Criteo, Avazu, KDD12, and MovieLens-1M. Metrics used for evaluation are AUC and Logloss, which are standard in CTR prediction tasks. AutoInt is compared against state-of-the-art models including Factorization Machines (FM), Neural FM (NFM), DeepCrossing, and others capable of modeling high-order interactions.

Results and Analysis

Experimental results demonstrate that AutoInt outperforms existing methods across multiple datasets. For instance, on the Criteo dataset, AutoInt achieves an AUC of 0.8061 and a Logloss of 0.4455, outperforming all compared models. The analysis shows that explicitly modeling feature interactions with the attention mechanism significantly enhances predictive accuracy.

An ablation study confirms the importance of the residual connections, as removing them leads to a noticeable drop in performance. Additionally, increasing the number of interacting layers improves performance initially, but saturates after three layers, highlighting the model's capacity to capture necessary high-order interactions without excessive depth.

Explainability

AutoInt allows visual interpretations of feature combinations through the attention scores it generates, providing enhanced explainability. For example, in analyzing the MovieLens-1M dataset, the attention mechanism highlighted intuitive and meaningful feature pairs and triples, such as <Gender, Age, Genre> combinations.

Future Work

While AutoInt shows significant improvements in static CTR prediction tasks, future work may explore its integration with contextual information and adaptation for dynamic online recommendation scenarios. Additionally, extending the model to broader machine learning tasks like regression and ranking could expand its applicability.

Conclusion

The AutoInt method presents an effective and scalable solution for CTR prediction by automatically learning high-order feature interactions using self-attentive neural networks. It balances efficacy with efficiency, providing strong numerical results and promising potential for future developments in recommendation systems and beyond. The ability to explain the interactions further enhances its practical utility, addressing both predictive performance and model interpretability.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.