Neural Factorization Machines for Sparse Predictive Analytics (1708.05027v1)

Published 16 Aug 2017 in cs.LG

Abstract: Many predictive tasks of web applications need to model categorical variables, such as user IDs and demographics like genders and occupations. To apply standard machine learning techniques, these categorical predictors are always converted to a set of binary features via one-hot encoding, making the resultant feature vector highly sparse. To learn from such sparse data effectively, it is crucial to account for the interactions between features. Factorization Machines (FMs) are a popular solution for efficiently using the second-order feature interactions. However, FM models feature interactions in a linear way, which can be insufficient for capturing the non-linear and complex inherent structure of real-world data. While deep neural networks have recently been applied to learn non-linear feature interactions in industry, such as the Wide&Deep by Google and DeepCross by Microsoft, the deep structure meanwhile makes them difficult to train. In this paper, we propose a novel model Neural Factorization Machine (NFM) for prediction under sparse settings. NFM seamlessly combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Conceptually, NFM is more expressive than FM since FM can be seen as a special case of NFM without hidden layers. Empirical results on two regression tasks show that with one hidden layer only, NFM significantly outperforms FM with a 7.3% relative improvement. Compared to the recent deep learning methods Wide&Deep and DeepCross, our NFM uses a shallower structure but offers better performance, being much easier to train and tune in practice.

Citations (1,244)

View on Semantic Scholar

Summary

The paper presents a novel Neural Factorization Machine that leverages a Bi-Interaction pooling operation to effectively capture second-order feature interactions.
It bridges linear factorization machines and deep neural networks to model complex higher-order feature interactions with a simplified training process.
Empirical results on real-world datasets like MovieLens demonstrate that NFM outperforms state-of-the-art methods while mitigating overfitting via dropout and batch normalization.

Neural Factorization Machines for Sparse Predictive Analytics

This paper, authored by Xiangnan He and Tat-Seng Chua, introduces Neural Factorization Machines (NFM), a novel model that integrates the strengths of Factorization Machines (FMs) with the representational power of deep neural networks. The importance of effectively modeling interactions between highly sparse categorical features in predictive analytics tasks is well-recognized. Common techniques such as one-hot encoding exacerbate data sparsity, challenging the efficacy of conventional machine learning algorithms. The paper posits that existing methods either lack the ability to capture complex non-linear interactions or are prohibitively difficult to train.

Key Contributions

1. Introduction of NFM

The NFM model introduces a new pooling operation titled Bilinear Interaction (Bi-Interaction). This operation is innovative in its approach to capturing second-order feature interactions in a manner that can be efficiently computed. Traditional FMs, while effective in exploiting linear interactions between features, fall short in capturing non-linear dependencies that are often prevalent in real-world data. By leveraging the Bi-Interaction operation, NFM overcomes this limitation and enhances the expressive capability of the model.

2. Bridging Linear FMs and Deep Neural Networks

NFMs serve as a sophisticated fusion between traditional FMs and neural networks. Conceptually, the NFM generalizes FMs by allowing non-linear transformations above the bi-interaction layer, making it possible to model complex higher-order interactions. The model ensures that early layers capture informative interactions, reducing the need for more extensive layers, thus simplifying training without sacrificing performance.

3. Empirical Validation

The paper rigorously validates the model on two real-world datasets: a context-aware app discovery tool (Frappe) and the highly populated MovieLens dataset. For both datasets, the NFM consistently outperforms not only its FM counterpart but also state-of-the-art methods such as Wide&Deep and DeepCross. Notably, NFM achieves significant improvements with the introduction of just a single hidden layer, underscoring its efficiency and capacity.

Experimental Observations

Dropout and Regularization

Comprehensive experiments reveal substantial performance enhancements through the application of dropout, particularly in the Bi-Interaction layer. Compared to traditional $L_2$ regularization, dropout significantly mitigates overfitting by preventing neurons from co-adapting excessively. This highlights dropout as a potent regularization technique in the context of NFMs and latent-factor models more broadly.

Batch Normalization

Incorporation of batch normalization (BN) dramatically accelerates convergence during training. By normalizing layer inputs, BN addresses the internal covariate shift, resulting in more stable and rapid training phases. Experiments demonstrate that NFMs equipped with both dropout and BN achieve high performance with robust generalization capabilities.

Evaluation Against Higher-Order FMs and State-of-the-Art Methods

The empirical results provide compelling evidence that NFMs, even with minimal structural depth, surpass the performance benchmarks set by higher-order FMs and sophisticated deep learning paradigms such as Wide&Deep and DeepCross. The primary factor attributed to this superior performance is the Bi-Interaction layer, which provides a dense and informative foundation for subsequent neural processing.

Practical Implications and Future Directions

The implications of this research extend significantly into practical applications of predictive analytics where feature interactions are complex and data sparsity is a challenge. NFMs exhibit both theoretical robustness and practical efficacy, making them a compelling choice for a wide range of tasks, from recommendation systems to targeted advertising.

Looking forward, there are numerous avenues for extending this work. Enhancing the efficiency of NFMs through advanced techniques like hashing can make them even more viable for large-scale applications. Additionally, future research can explore the integration of NFMs with Recurrent Neural Networks (RNNs) for handling sequential data, potentially broadening their applicability to domains such as natural language processing and time-series prediction.

In conclusion, this paper provides a thorough exploration of the limitations of traditional FMs and a concrete, validated approach to overcoming them through the innovative framework of NFMs. It sets a new precedent in the domain of sparse predictive analytics, combining theoretical advancements with strong empirical support, making a substantial contribution to the field.

PDF Markdown