This paper introduces a curriculum learning strategy to enhance the performance of LLMs such as Mistral-7B and Gemma-7B, without scaling the model size or dataset volume. The approach involves structuring training data based on complexity, using prompt length, attention scores, and loss values as difficulty metrics. The goal is to improve LLM performance by training the models on simpler tasks before progressively introducing more complex ones.
Here's a more detailed breakdown:
- Introduction: The paper addresses the challenges associated with scaling LLMs, such as increased computational resource demands and costs, and proposes a data-centric training strategy using curriculum learning. This approach is inspired by human educational methods, where