Dice Question Streamline Icon: https://streamlinehq.com

Quantify the impact of multilingual training and explain observed benchmark gaps

Investigate the degree to which multilingual training on the OpenAssistant Conversations (OASST1) dataset improves instruction-following performance in languages other than English and determine whether multilingual training explains the larger performance gap between Vicuna-13B (English-only training) and Guanaco 33B/65B on the Open Assistant benchmark.

Information Square Streamline Icon: https://streamlinehq.com

Background

Guanaco models are trained on the multilingual OASST1 dataset, and the Open Assistant benchmark also includes multilingual prompts. The authors observe performance differences between Vicuna-13B and Guanaco models on this benchmark.

They explicitly leave to future work determining how much multilingual training contributes to non-English performance and whether it accounts for the observed model performance gap on the OA benchmark.

References

We leave it to future work to investigate the degree to which such multilingual training improves performance on instructions in languages other than English and whether this explains the larger gap between Vicuna-13B model (only trained on English data) and Guanaco 33B and 65B on the OA benchmark.

QLoRA: Efficient Finetuning of Quantized LLMs (2305.14314 - Dettmers et al., 2023) in Section "Considerations" (Data Training paragraph)