Emma

Summary:

  • A study compares GPT-4 and other large language models (LLMs) on complex reasoning tasks, including mathematics, science, symbolic reasoning, knowledge, and coding.
  • GPT-4 outperforms other models on GSM8K and MMLU tasks, while the 65B LLaMA model comes close to text/code-davinci-002 performance.

Tags:

Open Source