Higher-Order Factorization Machines (1607.07195v2)

Published 25 Jul 2016 in stat.ML and cs.LG

Abstract: Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional. Unfortunately, despite increasing interest in FMs, there exists to date no efficient training algorithm for higher-order FMs (HOFMs). In this paper, we present the first generic yet efficient algorithms for training arbitrary-order HOFMs. We also present new variants of HOFMs with shared parameters, which greatly reduce model size and prediction times while maintaining similar accuracy. We demonstrate the proposed approaches on four different link prediction tasks.

Citations (191)

View on Semantic Scholar

Summary

The paper introduces efficient linear-time dynamic programming algorithms to train arbitrary-order Factorization Machines, overcoming prior scalability limitations.
Novel shared-parameter models, like the inhomogeneous ANOVA and all-subsets kernels, are proposed to reduce model complexity and prediction time without sacrificing accuracy.
Empirical results demonstrate the superior performance of Higher-Order Factorization Machines and their shared-parameter variants in various link prediction tasks, highlighting their potential in high-dimensional applications.

Higher-Order Factorization Machines: Efficient Algorithms for Training and Applications

The paper "Higher-Order Factorization Machines" by Blondel et al. addresses a significant gap in the field of supervised learning with Factorization Machines (FMs). While FMs provide an efficient mechanism for second-order feature combination modeling, the extension to higher-order FMs (HOFMs) lacked efficient training algorithms, hindering their application. This work remedies this by introducing algorithms for arbitrary-order HOFMs and further exploring the utility of shared-parameter models to maintain accuracy while reducing model complexity.

Efficient Training Algorithms for Higher-Order FMs

Factorization Machines operate by leveraging a low-rank matrix to model feature combinations, enabling them to approach the accuracy of polynomial regression while being computationally efficient. However, extending FMs to higher orders was limited due to the combinatorial explosion in the number of parameters and the high computational cost associated with naive predictions. This paper introduces linear-time dynamic programming algorithms, offering solutions for evaluating the ANOVA kernel and computing its gradient in an efficient manner. These algorithms enable arbitrary-order HOFMs to scale, processing data in $O(dm)$ time, which is linear in the number of features and order, a significant improvement over previous methods.

Shared Parameters for Reduced Complexity

A noteworthy contribution is the introduction of HOFM variants that share parameters across different polynomial degrees. This is achieved through the inhomogeneous ANOVA kernel and the all-subsets kernel, which reduce model sizes and prediction times without compromising accuracy. The inhomogeneous ANOVA kernel utilizes shared parameters, weighting feature combinations uniformly or according to a learned set of parameters. The all-subsets kernel further optimizes efficiency by leveraging the sparseness of data, allowing training models to focus computational resources effectively. These shared-parameter models demonstrate competitive performance in link prediction tasks across varied datasets, including those from gene-disease associations and movie recommendation systems.

Empirical Performance and Applications

The empirical analyses present compelling evidence of the presented algorithms' effectiveness. HOFMs, both in their standard form and with shared parameters, show superior performance compared to traditional low-rank bilinear regression models in link prediction tasks. The testing across datasets such as NIPS co-authorship graph, enzyme networks, gene-disease associations, and movie recommendations attest to the robustness and adaptability of higher-order models. Particularly, the shared-augmented models were effective, achieving similar accuracy with reduced computational overhead.

Implications and Future Directions

The results open avenues for efficient polynomial modeling in high-dimensional feature spaces within machine learning. The efficiency gained with these algorithms suggests integration could be beneficial in large-scale applications, including those in deep learning frameworks where they can complement existing architectures such as convolutional layers. Moreover, these HOFMs are well-suited for distributed environments, potentially transforming how large datasets in recommendation systems and computational biology are approached.

The paper hints at future work extending stochastic gradient-based training algorithms to maximize AUC directly, catering to datasets with skewed positive-negative distributions. Such advancements are crucial to refining the applicability of higher-order models in practical settings. Additionally, the prospect of incorporating these algorithms into contemporary machine learning platforms highlights their relevance and potential impact.

This research into higher-order factorization machines marks significant progress in efficient polynomial feature learning, suggesting promising avenues for both academic pursuits and practical implementations in artificial intelligence.