Nvidia has announced an open source software, TensorRT-LLM, which improves the performance of large language model inference, effectively doubling the speed on its H100 GPUs. The software, developed in collaboration with leading tech firms, is expected to be released in the coming weeks for Ampere Lovelace and Hopper GPUs.
TensorRT-LLM incorporates techniques to maximize utilization of Nvidia's GPUs and has shown impressive gains in benchmark results. It makes popular models easily deployable, reducing costs and increasing efficiency. This software could give Nvidia's H100 and future Hopper-based systems a significant advantage in the AI field.