Papers
Topics
Authors
Recent
Search
2000 character limit reached

Covariance estimation for vertically partitioned data in a distributed environment

Published 23 Jun 2016 in cs.DC | (1606.07336v1)

Abstract: The major sources of abundant data are constantly expanding with the available data collection methodologies in various applications - medical, insurance, scientific, bio-informatics and business. These data sets may be distributed geographically, rich in size and as well as dimensions also. To analyze these data sets to find out the hidden patterns, it is required to down- load the data to a centralized site which is a challenging task in terms of the limited bandwidth available and computationally also expensive. The covariance matrix is one of the methods to estimate the relation between any two dimensions. In this paper, we propose a communication efficient algorithm to estimate the covariance matrix in a distributed manner. The global covariance matrix is computed by merging the local covariance matrices using a distributed approach. The results show that it is exactly same as centralized method with good speed-up in terms of computation. The reason for speed-up is because of the parallel construction of local covariances and distributing the cross-covariances among the nodes so that the load is balanced. The results are analyzed by considering Mfeat data set on the various partitions which address the scalability also.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.