Dice Question Streamline Icon: https://streamlinehq.com

Efficiency and cost impacts of cluster “slicing” for training

Characterize the efficiency and cost impacts of training AI models using large numbers of less powerful chips (cluster slicing) versus fewer more powerful chips with the same theoretical throughput, including implications for decentralized or disaggregated training configurations.

Information Square Streamline Icon: https://streamlinehq.com

Background

Governance measures targeting compute should understand how cluster composition affects training feasibility and cost. If many smaller chips can substitute for fewer powerful chips without major penalties, restrictions on specific chip classes may be less effective.

Quantifying slicing trade-offs informs more precise hardware policies and helps anticipate circumvention via cluster design.

References

Another open problem is the efficiency and cost impact of using a larger number of less powerful chips within a cluster, as opposed to using a smaller number of more powerful chips totaling the same theoretical throughput, sometimes known as slicing.

Open Problems in Technical AI Governance (2407.14981 - Reuel et al., 20 Jul 2024) in Section 3.2.1 Definition of Chip and Cluster Specifications for Model Training