2000 character limit reached
Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond
Published 8 Mar 2019 in cs.DC, cs.LG, and cs.MS | (1903.04243v1)
Abstract: We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from auto-batching and per-example gradients, to jacobian computation, optimized map functions and input pipeline optimization. We report huge speedups compared to both loop based implementations, as well as run-time batching adopted by the DyNet framework.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.