QLoRA is a new finetuning method that allows training a 65 billion parameter model on a single 48GB GPU with 16-bit task performance.
The Guanaco model family, developed using QLoRA, outperforms other openly released models on the Vicuna benchmark, achieving 99.3% of ChatGPT's performance with just 24 hours of fine-tuning.