Chinese AI startup DeepSeek, known for developing a ChatGPT competitor trained on 2 trillion tokens in English and Chinese, has unveiled DeepSeek Coder V2, an open-source mixture of experts (MoE) model for code generation.
Building on the success of DeepSeek-V2, released last month, DeepSeek Coder V2 excels at coding and math tasks, supporting over 300 programming languages. It outshines leading closed-source models, such as GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro, marking a significant milestone as the first open model to achieve this level of performance, surpassing Llama 3-70B and others in its category.
Founded in 2022, DeepSeek aims to "unravel the mystery of AGI with curiosity." Within a year, the company has open-sourced several models, including the DeepSeek Coder family. The original DeepSeek Coder, with 33 billion parameters, performed well with project-level code completion and infilling, but only supported 86 programming languages and had a context window of 16K. The new V2 expands language support to 338 and increases the context window to 128K, enabling it to tackle more complex coding challenges.
In benchmarks like MBPP+, HumanEval, and Aider, designed to assess code generation, editing, and problem-solving abilities, DeepSeek Coder V2 achieved scores of 76.2, 90.2, and 73.7, respectively, outperforming many closed and open-source models, including GPT-4 Turbo, Claude 3 Opus, and Llama-3 70B. It demonstrated similar strong results in mathematical benchmarks (MATH and GSM8K).
The only model to surpass DeepSeek Coder V2 on multiple benchmarks was GPT-4o, with slightly higher scores in HumanEval, LiveCode Bench, MATH, and GSM8K. DeepSeek derived these advancements from DeepSeek V2, which utilizes a Mixture of Experts framework, pre-trained on an extensive 6 trillion-token dataset focused on code and math, primarily sourced from GitHub and CommonCrawl.
With 16B and 236B parameter options, the model activates only 2.4B and 21B expert parameters for specific tasks while optimizing computing efficiency.
In addition to its coding prowess, DeepSeek Coder V2 shows strong general reasoning and language understanding capabilities. For instance, it scored 79.2 on the MMLU benchmark, surpassing other code-specific models and closely matching Llama-3 70B. GPT-4o and Claude 3 Opus lead the MMLU category with scores of 88.7 and 88.6, respectively.
This development indicates that open-source coding models are progressing across a broader scope of applications, increasingly rivaling leading closed-source technologies.
DeepSeek Coder V2 is available under the MIT license, allowing for both research and commercial use. Users can download the 16B and 236B models in instruct and base configurations via Hugging Face, or access them through an API on the DeepSeek platform with a pay-as-you-go model.
To explore its capabilities, users can interact with DeepSeek Coder V2 through a chatbot on the company’s platform.