• This repository provides a minimal, hackable, and readable example to load LLaMA models and run inference.
  • Users can request access to download the tokenizer and model files, then run the provided example.py on a single or multi-GPU node with torchrun.

Key terms:

  • LLaMA: A type of model used in machine learning
  • Inference: The process of making predictions with a trained model
  • Repository: A storage location for code and related files
  • Tokenizer: A component that breaks text into smaller units, such as words or sentences
  • Torchrun: A command-line tool for running PyTorch scripts on single or multi-GPU nodes


Research Open Source LLaMA GitHub AI models Language Model PyTorch Inference multi-GPU Checkpoints