Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform (1805.04886v1)

Published 13 May 2018 in cs.DC

Abstract: Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three Vs (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Nikolay Malitsky (2 papers)
  2. Aashish Chaudhary (2 papers)
  3. Sebastien Jourdain (1 paper)
  4. Matt Cowan (3 papers)
  5. Patrick O'Leary (3 papers)
  6. Marcus Hanwell (3 papers)
  7. Kerstin Kleese van Dam (4 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.