PaSh: Light-touch Data-Parallel Shell Processing (2007.09436v4)
Abstract: This paper presents {\scshape PaSh}, a system for parallelizing POSIX shell scripts. Given a script, {\scshape PaSh} converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script -- one that adds POSIX constructs to explicitly guide parallelism coupled with {\scshape PaSh}-provided {\scshape Unix}-aware runtime primitives for addressing performance- and correctness-related issues. A lightweight annotation language allows command developers to express key parallelizability properties about their commands. An accompanying parallelizability study of POSIX and GNU commands -- two large and commonly used groups -- guides the annotation language and optimized aggregator library that {\scshape PaSh} uses. Finally, {\scshape PaSh}'s {\scshape PaSh}'s extensive evaluation over 44 unmodified {\scshape Unix} scripts shows significant speedups ($0.89$--$61.1\times$, avg: $6.7\times$) stemming from the combination of its program transformations and runtime primitives.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.