Emma
Summary:
-
This repository provides a minimal, hackable, and readable example to load LLaMA models and run inference.
-
Users can request access to download the tokenizer and model files, then run the provided example.py on a single or multi-GPU node with torchrun.
Key terms:
-
LLaMA: A type of model used in machine learning
-
Inference: The process of making predictions with a trained model
-
Repository: A storage location for code and related files
-
Tokenizer: A component that breaks text into smaller units, such as words or sentences
-
Torchrun: A command-line tool for running PyTorch scripts on single or multi-GPU nodes
Tags:
Research
Open Source
LLaMA
GitHub
AI models
Language Model
PyTorch
Inference
multi-GPU
Checkpoints