Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) (1405.5869v1)

Published 22 May 2014 in stat.ML, cs.DS, cs.IR, and cs.LG

Abstract: We present the first provably sublinear time algorithm for approximate \emph{Maximum Inner Product Search} (MIPS). Our proposal is also the first hashing algorithm for searching with (un-normalized) inner product as the underlying similarity measure. Finding hashing schemes for MIPS was considered hard. We formally show that the existing Locality Sensitive Hashing (LSH) framework is insufficient for solving MIPS, and then we extend the existing LSH framework to allow asymmetric hashing schemes. Our proposal is based on an interesting mathematical phenomenon in which inner products, after independent asymmetric transformations, can be converted into the problem of approximate near neighbor search. This key observation makes efficient sublinear hashing scheme for MIPS possible. In the extended asymmetric LSH (ALSH) framework, we provide an explicit construction of provably fast hashing scheme for MIPS. The proposed construction and the extended LSH framework could be of independent theoretical interest. Our proposed algorithm is simple and easy to implement. We evaluate the method, for retrieving inner products, in the collaborative filtering task of item recommendations on Netflix and Movielens datasets.

Citations (460)

Summary

  • The paper’s main contribution is the novel ALSH framework that applies asymmetric transformations to efficiently handle maximum inner product search.
  • The methodology transforms inner product problems into approximate near neighbor search tasks, yielding provably sublinear query times through rigorous theoretical analysis.
  • Experimental results on Netflix and Movielens datasets demonstrate that ALSH significantly outperforms traditional LSH methods in scalability and effectiveness.

Asymmetric Locality Sensitive Hashing for Maximum Inner Product Search

The research paper by Anshumali Shrivastava and Ping Li introduces a novel algorithmic framework to address the challenge of Maximum Inner Product Search (MIPS) in sublinear time. The authors propose Asymmetric Locality Sensitive Hashing (ALSH), a significant extension to the classical Locality Sensitive Hashing (LSH) paradigm. In contrast to traditional LSH, which is applicable for distance-based similarity measures, ALSH is specifically tailored for un-normalized inner products. The inability of classical LSH to effectively handle inner products due to their non-monotonic nature with respect to Euclidean norms necessitates this novel approach.

Core Contributions

The paper presents several key contributions that are central to its claims of efficiency in solving MIPS:

  1. Impossibility under Current LSH Framework: The authors begin by formalizing why the existing LSH framework is inadequate for solving MIPS. They establish through mathematical proof that there cannot exist any LSH family under the traditional definition that can handle un-normalized inner product similarity effectively.
  2. Introduction of ALSH: The core innovation of the paper lies in the development of ALSH. This framework relaxes the symmetric requirement of traditional LSH by allowing distinct vector transformations for queries and preprocessed data points. Specifically, the ALSH framework employs asymmetric transformations QQ and PP for queries and database points, respectively. This asymmetry is crucial for adapting the hashing to the properties of inner products, circumventing the pitfalls of existing methods.
  3. Transformations for Inner Product Search: The authors specify a clever transformation strategy where inner products are transformed into approximate near neighbor search tasks. The proposed transformations ensure the asymmetry necessary for the ALSH scheme to not only be applicable but effective in deriving meaningful similarities through inner products.
  4. Provably Sublinear Query Time: Through theoretical constructs, the paper demonstrates that with the choice of appropriate transformation parameters, ALSH guarantees provably sublinear query times for approximate MIPS. This is a substantial contribution, as it suggests practical implementation potential in large-scale applications where linear scans are computationally prohibitive.
  5. Experimental Validation: The proposed ALSH model is experimentally validated using collaborative filtering datasets, specifically Netflix and Movielens, where it demonstrates superior performance over existing hashing methods, such as L2LSH. These experiments provide tangible evidence of ALSH's efficacy in dealing with real-world data characterized by variable norms.

Implications and Future Directions

The implications of this research are broad and impactful across several domains where MIPS is a common subroutine. ALSH transforms the landscape of approximate similarity search by addressing the limitations of traditional LSH methods in high-dimensional spaces. This is particularly relevant in applications such as recommender systems, large-scale object detection, structural SVMs, and multi-class label prediction.

Moreover, the concept of asymmetric transformations opens up new avenues for computational optimizations in similarity search algorithms. The potential for extending ALSH to three-way or higher-order similarity searches and exploring ALSH in the context of other similarity functions provides a fertile ground for further research.

Conclusion

The development of ALSH represents a significant advancement in the field of approximate similarity search. By effectively handling the complexities introduced by inner product similarities, this framework promises enhanced efficiency and scalability. The paper's rigorous theoretical foundation combined with empirical validation asserts ALSH as a robust tool for MIPS, paving the way for future innovations and applications in AI and machine learning disciplines.