2000 character limit reached
An Alternative C++ based HPC system for Hadoop MapReduce (2005.07600v2)
Published 8 May 2020 in cs.DC
Abstract: MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce framework could be used that can perform more memory-efficiently and faster than the standard MapReduce. This paper explores an entirely C++ based approach to the MapReduce and its feasibility on multiple factors like developer friendliness, deployment interface, efficiency and scalability. This paper also introduces Delayed Reduction and deployment techniques that can speed up MapReduce in a compiled environment.