Unknown training token count for Mistral 7B

Ascertain the number of training tokens used to train Mistral 7B to enable precise comparisons with Llama-2 and Solar and to evaluate the role of training data volume in preference learning and brittleness.

Background

The discussion relates model brittleness and value-based preference emergence to training volume and attention architecture. Llama-2 reportedly used 2 trillion tokens and Solar built on Mistral weights, but the exact training token count for Mistral remains unspecified, limiting causal inference about token volume effects.

References

The number of training tokens is unknown.

— Do Large Language Models Learn Human-Like Strategic Preferences? (2404.08710 - Roberts et al., 11 Apr 2024) in Section 3.4, Why are Solar and Mistral Not Brittle?

Unknown training token count for Mistral 7B

Sponsor

Background

References

Related Problems