Efficient Incorporation of Multiple Latency Targets in the Once-For-All Network (2012.06748v1)

Published 12 Dec 2020 in cs.LG

Abstract: Neural Architecture Search has proven an effective method of automating architecture engineering. Recent work in the field has been to look for architectures subject to multiple objectives such as accuracy and latency to efficiently deploy them on different target hardware. Once-for-All (OFA) is one such method that decouples training and search and is able to find high-performance networks for different latency constraints. However, the search phase is inefficient at incorporating multiple latency targets. In this paper, we introduce two strategies (Top-down and Bottom-up) that use warm starting and randomized network pruning for the efficient incorporation of multiple latency targets in the OFA network. We evaluate these strategies against the current OFA implementation and demonstrate that our strategies offer significant running time performance gains while not sacrificing the accuracy of the subnetworks that were found for each latency target. We further demonstrate that these performance gains are generalized to every design space used by the OFA network.

Authors (2)

Vidhur Kumar (2 papers)
Andrew Szidon (1 paper)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Efficient Incorporation of Multiple Latency Targets in the Once-For-All Network (2012.06748v1)

Summary

Related Papers