Evaluate Guanaco’s behavior on additional types of social bias

Ascertain whether the Guanaco-65B model performs well when assessed on types of social biases beyond those measured in the presented analysis, by conducting broader bias evaluations across additional bias categories and datasets.

Background

The authors report that finetuning on OASST1 appears to reduce bias compared to the base LLaMA model on the CrowS benchmark. However, they acknowledge that this single evaluation is limited in scope.

They explicitly state it is unclear how Guanaco performs on other types of biases and leave broader bias analysis for future work.

References

While these results are encouraging, it is unclear if Guanaco does also well when assessed on other types of biases. We leave further evaluation of analyzing biases in Guanaco and similar chatbots to future work.

— QLoRA: Efficient Finetuning of Quantized LLMs (2305.14314 - Dettmers et al., 2023) in Section "Limitations and Discussion"

Evaluate Guanaco’s behavior on additional types of social bias

Background

References

Related Problems