Introducing DeepSeek Chat: China’s Newest ChatGPT Competitor Featuring an Impressive 67B Model

As ChatGPT celebrates its first anniversary this week, Chinese startup DeepSeek AI is entering the competitive conversational AI landscape with its new offering: DeepSeek Chat.

Currently in an alpha testing phase, DeepSeek Chat utilizes 7B and 67B-parameter DeepSeek LLMs, trained on a dataset of 2 trillion tokens in both English and Chinese. Benchmarks indicate that these models excel in various evaluations, including coding and mathematics, often matching or even surpassing Meta’s Llama 2-70B.

The introduction of DeepSeek Chat adds to the growing array of Chinese players in the AI market, following notable releases from Qwen, 01.AI, and Baidu. DeepSeek has made both base and instruction-tuned versions of its models open-source to encourage further research within academic and commercial sectors.

Founded recently with a mission to unravel AGI, DeepSeek also allows commercial usage under certain conditions.

Key Features of DeepSeek Chat and LLMs

DeepSeek Chat is available via a web interface similar to ChatGPT, allowing users to sign in and interact with the model for various tasks. Currently, only the 67B version is accessible through this platform.

Both of DeepSeek's models are constructed using an auto-regressive transformer decoder architecture akin to Llama, but they differ in their inference methods. The smaller 7B model employs multi-head attention (MHA), while the larger 67B model utilizes grouped-query attention (GQA).

According to the models’ GitHub page, the 7B model was trained with a batch size of 2304 and a learning rate of 4.2e-4, while the 67B model used a batch size of 4608 and a learning rate of 3.2e-4. The training protocol includes a multi-step learning rate schedule, starting with 2000 warm-up steps before adjusting based on token count.

In testing, the DeepSeek LLM 67B Base showcased impressive general capabilities, outperforming Llama2 70B Base in reasoning, coding, mathematics, and Chinese comprehension. The only area where Llama performed slightly better was in 5-shot trivia QA (79.5 vs. 78.9).

The fine-tuned chat version also excelled in previously unseen tests. For example, it achieved a score of 73.78 on the HumanEval pass@1 coding task and 84.1 on GSM8K zero-shot mathematics, placing it just behind GPT-4 and Anthropic’s Claude 2.

However, despite these strong benchmarks, there are indications that the DeepSeek model may have censorship mechanisms. A user on X noted that responses were redacted when the topic concerned China, replaced by a message stating that the content was “withdrawn” for security reasons. It remains unclear whether the base model also has similar filters.

Diverse LLM Offerings

The release of DeepSeek LLMs signifies a significant advancement for China in the AI domain, enhancing the range of model sizes available to meet diverse user needs. Other recent Chinese AI offerings include Baidu’s Ernie 4.0, 01.AI’s Yi 34B, and Qwen’s models ranging from 1.8B to 72B.

Interestingly, some smaller models have outperformed their larger counterparts, such as Yi 34B, which has exhibited capabilities matching those of Llama-2-70B and Falcon-180B. This trend suggests that businesses can achieve efficiencies by opting for smaller models without compromising effectiveness, conserving computational resources while addressing various use cases.

Just last week, Microsoft entered this competitive space with the Orca 2 models, which have demonstrated superior performance compared to models five to ten times their size, including Llama-2Chat-70B.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles