Emergent Mind

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

(2402.03300)
Published Feb 5, 2024 in cs.CL , cs.AI , and cs.LG

Abstract

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Overview

  • DeepSeekMath represents an advancement in open language models' ability to perform complex mathematical reasoning using a large-scale curated dataset.

  • The model is based on a predecessor adept at coding tasks and further improved through innovative training methodologies like Group Relative Policy Optimization (GRPO).

  • GRPO, a novel training approach, leads to significant performance gains in both domain-specific and general mathematical tasks, highlighting model robustness.

  • DeepSeekMath demonstrates broad proficiency across multiple benchmarks and highlights opportunities for further improving data selection strategies for mathematical model training.

Overview of DeepSeekMath 7B

DeepSeekMath signifies a noteworthy step forward in the capability of open-source language models to perform advanced mathematical reasoning. Originating from an extensive pre-training dataset known as the DeepSeekMath Corpus, this model has demonstrated formidable accuracy on complex benchmarks such as the MATH dataset, GSM8K, and multilingual mathematical benchmarks, without external aids.

Data Curation and Model Training

At the heart of DeepSeekMath's notable success lies the DeepSeekMath Corpus, an expansive collection of web pages curated for mathematical content, which substantially surpasses in size the datasets hitherto utilized in similar research. This meticulous selection process clearly demonstrates the potential of web data in improving mathematical reasoning capabilities. The base model, DeepSeekMath-Base 7B, was built upon the substantial foundation of DeepSeek-Coder-Base-v1.5 7B, chosen due to its affinity for code training which, as our findings suggest, substantially enhances mathematical problem-solving abilities beyond simple reasoning tasks.

Technology Behind the Achievement: GRPO

A distinct and innovative aspect of DeepSeekMath's training regime is the introduction of Group Relative Policy Optimization (GRPO). Unlike traditional Proximal Policy Optimization (PPO), GRPO estimations dispense with the need for a critic model, using group score averages as a baseline, thereby greatly reducing resource consumption. Remarkably, GRPO shows profound improvements not just in-domain for tasks like GSM8K and MATH, but also in out-of-domain tasks like CMATH, indicating a robust generalization aspect of the model.

Insights and Evaluation Metrics

DeepSeekMath offers several key insights. For instance, while the use of arXiv papers is common for math-related pre-training, our results indicate this may not be as beneficial as previously assumed, illuminating potential areas for improvement in data selection strategies. The model has undergone rigorous evaluation – within formal mathematics using proof assistants like Isabelle and across a variety of benchmarks assessing language understanding, reasoning, and coding capabilities – showcasing formidable general proficiency.

The advancements embodied by DeepSeekMath are poised to open new avenues in mathematical data processing and offer vital groundwork for future research endeavors in the evolution of language models for mathematical reasoning and beyond.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube