Frequent Directions : Simple and Deterministic Matrix Sketching (1501.01711v2)

Published 8 Jan 2015 in cs.DS

Abstract: We describe a new algorithm called Frequent Directions for deterministic matrix sketching in the row-updates model. The algorithm is presented an arbitrary input matrix $A \in R^{n \times d}$ one row at a time. It performed $O(d \times \ell)$ operations per row and maintains a sketch matrix $B \in R^{\ell \times d}$ such that for any $k < \ell$ $|A^TA - B^TB |2 \leq |A - A_k|_F² / (\ell-k)$ and $|A - \pi{B_k}(A)|F² \leq \big(1 + \frac{k}{\ell-k}\big) |A-A_k|_F² $ . Here, $A_k$ stands for the minimizer of $|A - A_k|_F$ over all rank $k$ matrices (similarly $B_k$) and $\pi{B_k}(A)$ is the rank $k$ matrix resulting from projecting $A$ on the row span of $B_k$. We show both of these bounds are the best possible for the space allowed. The summary is mergeable, and hence trivially parallelizable. Moreover, Frequent Directions outperforms exemplar implementations of existing streaming algorithms in the space-error tradeoff.

Citations (162)

View on Semantic Scholar

Summary

The paper introduces Frequent Directions, a simple and deterministic algorithm for matrix sketching in the row-update data stream model, offering strong theoretical guarantees.
Frequent Directions provides analytical bounds on covariance error (||A^TA - B^TB ||_2) and projection error (||A - ext{pi}_{B_k}(A)||_F^2), operating efficiently at O(dl) time per row.
Empirical results show Frequent Directions often outperforms alternative random methods, demonstrating optimal O(ld) space complexity suitable for low-memory and parallel data processing applications.

Overview of Frequent Directions: Simple and Deterministic Matrix Sketching

This paper introduces Frequent Directions (FD), a deterministic algorithm designed for matrix sketching in the row-update model of data streams. The algorithm is built on the analogy of the frequency estimation problem within streaming data, where it delivers strong theoretical guarantees both in terms of covariance error and projection error. FD seeks to optimize the trade-off between the memory space required and the accuracy of the sketch matrix, essentially aiming to outperform existing streaming algorithms.

Technical Summary

Frequent Directions processes an input matrix $A \in \mathbb{R}^{n \times d}$ one row at a time, maintaining a sketch matrix $B \in \mathbb{R}^{\ell \times d}$ . Its computational efficiency is characterized by $O(d\ell)$ operations per row. The algorithm supports any $k < \ell$ , and achieves two main analytical guarantees:

Covariance Error: It bounds the difference between the covariance matrices of $A$ and $B$ such that $\|A^TA - B^TB \|_2 \leq \|A - A_k\|_F^2 / (\ell-k)$ . This measure is crucial for directions analysis in large data sets.
Projection Error: It ensures that $\|A - \pi_{B_k}(A)\|_F^2 \leq \big(1 + k/(\ell-k)\big) \|A - A_k\|_F^2$ , highlighting that FD achieves relative accuracy while using less space than other algorithms.

FD is inspired by the Misra-Gries algorithm's approach to solving the item frequency approximation problem. This inspiration is evident in the matrix sketching context where FD leverages periodic shrinking of orthogonal vectors, emulating the invariant maintenance in Misra-Gries that no counter can fall below zero.

Numerical Results and Empirical Performance

Empirical results in this paper indicate that FD regularly outperforms popular alternatives such as random-projection and hashing methods in terms of covariance and projection errors. Especially significant is FD's capability in scenarios where the competing techniques tend to over-correct due to under-sampling.

Space Complexity and Optimality

FD is optimal in the framework of space complexity, requiring $O(\ell d) = O(kd/\epsilon)$ words, substantiated by the lower bounds that any algorithm achieving similar error bounds must use $\Omega(kd/\epsilon)$ space. It succeeds in defining the minimal sketch size necessary to guarantee such a precision level in a streaming model.

Practical and Theoretical Implications

On a practical ground, the algorithm suits applications demanding low memory consumption and precise data approximations, especially in scalable environments where data is often distributed across multiple machines. FD’s mergeability allows parallel computation without loss of fidelity, making it suitable for large-scale distributed data mining tasks.

Future Developments

While FD is deterministic and holds space and error optimality, exploring how FD might evolve with modifications, perhaps integrating randomization to leverage other structured noise spectra, could be intriguing. Further, scrutinizing its applications and performance on more diverse data sets will help refine its practical utility.

In summary, FD adds a valuable tool for deterministic matrix sketching, balancing simplicity and robust error guarantees in row-update streaming models. Its theoretical bounds and empirical validations assert its positioning as a go-to algorithm in scenarios demanding high accuracy and low space conflicts. Moreover, given its performance, FD prompts further exploration into deterministic algorithms within similar data-intensive contexts.