Papers
Topics
Authors
Recent
2000 character limit reached

NNTile: a machine learning framework capable of training extremely large GPT language models on a single node (2504.13236v1)

Published 17 Apr 2025 in cs.LG and cs.MS

Abstract: This study presents an NNTile framework for training large deep neural networks in heterogeneous clusters. The NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units (CPUs and GPUs). It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices, depending on automatic scheduling decisions. Such an approach shifts the burden of deciding where to compute and when to communicate from a human being to an automatic decision maker, whether a simple greedy heuristic or a complex AI-based software. The performance of the presented tool for training LLMs is demonstrated in extensive numerical experiments.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.