Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sign-Full Random Projections

Published 26 Apr 2018 in stat.ME, cs.IR, and cs.LG | (1805.00533v1)

Abstract: The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two $D$-dim data vectors $u$, $v\in\mathbb{R}D$, one can generate $x = \sum_{i=1}D u_i r_i$, and $y = \sum_{i=1}D v_i r_i$, where $r_i\sim N(0,1)$ iid. The "collision probability" is ${Pr}\left(sgn(x)=sgn(y)\right) = 1-\frac{\cos{-1}\rho}{\pi}$, where $\rho = \rho(u,v)$ is the cosine similarity. We develop "sign-full" random projections by estimating $\rho$ from (e.g.,) the expectation $E(sgn(x)y)=\sqrt{\frac{2}{\pi}} \rho$, which can be further substantially improved by normalizing $y$. For nonnegative data, we recommend an interesting estimator based on $E\left(y_- 1_{x\geq 0} + y_+ 1_{x<0}\right)$ and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood estimator. At high similarity ($\rho\rightarrow1$), the asymptotic variance of recommended estimator is only $\frac{4}{3\pi} \approx 0.4$ of the estimator for sign-sign projections. At small $k$ and high similarity, the improvement would be even much more substantial.

Citations (18)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.