Model-Parallel Model Selection for Deep Learning Systems (2107.06469v1)

Published 14 Jul 2021 in cs.DC and cs.LG

Abstract: As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in ML training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too large to be fit onto a single processor. To address the issue, many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices. Unfortunately, the sequential nature of neural networks causes very low efficiency and device utilization in model parallel training jobs. We propose a new form of "shard parallelism" combining task and model parallelism, then package it into a framework we name Hydra. Hydra recasts the problem of model parallelism in the multi-model context to produce a fine-grained parallel workload of independent model shards, rather than independent models. This new parallel design promises dramatic speedups relative to the traditional model parallelism paradigm.

Authors (1)

Kabir Nagrecha (6 papers)

Citations (16)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Model-Parallel Model Selection for Deep Learning Systems (2107.06469v1)

Summary

Related Papers