Big Data Intelligence Using Distributed Deep Neural Networks

Published 4 Sep 2019 in cs.DC | (1909.02873v1)

Abstract: Large amount of data is often required to train and deploy useful machine learning models in industry. Smaller enterprises do not have the luxury of accessing enough data for machine learning, For privacy sensitive fields such as banking, insurance and healthcare, aggregating data to a data warehouse poses a challenge of data security and limited computational resources. These challenges are critical when developing machine learning algorithms in industry. Several attempts have been made to address the above challenges by using distributed learning techniques such as federated learning over disparate data stores in order to circumvent the need for centralised data aggregation. This paper proposes an improved algorithm to securely train deep neural networks over several data sources in a distributed way, in order to eliminate the need to centrally aggregate the data and the need to share the data thus preserving privacy. The proposed method allows training of deep neural networks using data from multiple de-linked nodes in a distributed environment and to secure the representation shared during training. Only a representation of the trained models (network architecture and weights) are shared. The algorithm was evaluated on existing healthcare patients data and the performance of this implementation was compared to that of a regular deep neural network trained on a single centralised architecture. This algorithm will pave a way for distributed training of neural networks on privacy sensitive applications where raw data may not be shared directly or centrally aggregating this data in a data warehouse is not feasible.