Author columns

DeepSeek R1: How an RL-based open source model challenged OpenAI on a minimal budget

DeepSeek R1: How an RL-based open source model challenged OpenAI on a minimal budget

The DeepSeek R1 model caused a sensation in the AI community on Monday, shattering stereotypes about the resources required to achieve cutting-edge results in the field. It demonstrated OpenAI o1-level performance while spending only 3-5% of their budget. Open access and impressive efficiency make this model a challenge for both developers and large companies rethinking their AI strategies.

.

Record downloads and a new market leader

DeepSeek R1 has already become the most downloaded model on the HuggingFace platform, reaching 109,000 downloads at the time of writing. Developers are eager to explore its capabilities and see how it will impact their projects.

DeepSeek R1 is the most downloaded model on the HuggingFace platform at the time of writing.

Users note that DeepSeek’s built-in search function outperforms solutions from OpenAI and Perplexity, with only Google Gemini Deep Research remaining its closest competitor.

Open technology vs. closed solutions

For the enterprise sector, the arrival of DeepSeek R1 breaks new ground: instead of expensive proprietary models like OpenAI, companies have access to a powerful open source tool. This could democratize AI, allowing small organizations to compete with industry giants.

Breakthrough: betting on pure reinforcement learning

DeepSeek first made headlines in November when it announced the surpassed performance of the OpenAI o1. However, only a limited version of the R1-lite-preview model was available at that time. The full R1 release, along with a white paper unveiled on Monday, revealed the company’s radical approach: a complete abandonment of the standard teacher-training method (SFT).

How DeepSeek bypassed traditional approaches

Steacher-led training (SFT) involves using pre-selected datasets to train models. This method is widely used to improve the logical ability of language models (aka “chain of reasoning”). DeepSeek, on the other hand, has abandoned SFT, focusing entirely on reinforcement learning (RL).

This bold move allowed the model to develop reasoning skills on its own, avoiding the drawbacks associated with template data. Although a minimal amount of SFT had to be used in the final stages of development to address some of the shortcomings, the results confirmed the viability of the approach, with RL independently delivering significant performance improvements.

And the results are now available.

Limited resources, maximized efficiency

DeepSeek, founded in 2023 as a spin-off of Chinese hedge fund High-Flyer Quant, started by developing AI for internal use. The company later released the models to the public.

DeepSeek’s models are now available to the public.

The company was able to provide training by purchasing more than 10,000 Nvidia GPUs before export restrictions were imposed, and then expanded the fleet to 50,000 GPUs through alternative channels. That’s still far fewer resources than OpenAI, Google, or Anthropic, which use more than 500,000 GPUs each.

The company has been able to provide training by purchasing more than 10,000 Nvidia GPUs before the export restrictions.

DeepSeek has managed to prove that innovation and ingenuity can rival the huge budgets of market leaders.

Costs remain a mystery

According to Nvidia engineer Jim Fan, the base model V3 was trained for $5.58 million over two months. However, the total cost of training the DeepSeek R1 remains unknown. Supposedly, running 50,000 GPUs required hundreds of millions of dollars, but there’s no exact figure.

According to Nvidia engineer Jim Fan.

How the R1 came to its “moment of epiphany”

The initial version of the model, DeepSeek-R1-Zero, was trained exclusively with RL. This approach incentivized the model to not only produce the correct answers, but also to develop the logic leading to them.

The model was trained with RL alone.

Key discovery: prioritizing complex tasks

This approach led to an unexpected effect: the model began to allocate more time to complex tasks, prioritizing them on its own. The researchers called it an “epiphany moment,” when the model found an unconventional solution and even described it in human terms.

“We just provided the right incentives, and the model itself developed advanced problem-solving strategies,” the study said.

“We just provided the right incentives, and the model itself developed advanced problem-solving strategies,” the study said.

Summary: RL combined with limited SFT

Despite the successes of RL, DeepSeek-R1-Zero faced challenges such as low readability and language confusion. To fix this, the team used minimal SFT on cold data (with minimal preprocessing) and then reapplied RL. The model was then refined through standard tuning steps.

DeepSeek-R1 article. Don't let this graphic intimidate you. The key finding is the red line where the model literally used the phrase
DeepSeek-R1 article. Don’t let this graphic intimidate you. The key finding is the red line where the model literally used the phrase “aha moment.” The researchers seized on this as a prime example of the model’s ability to reframe problems in an anthropomorphic tone. For the researchers, it was their own “aha moment.”

Implications for the AI market

Challenge for OpenAI and new competitors

DeepSeek-R1 sets new standards: the model not only outperforms competitors in openness and cost, but also provides full transparency of the reasoning chain. Unlike OpenAI, which hides details of the output, DeepSeek allows developers to find and fix bugs, which is especially valuable for enterprise applications.

DeepSeek-R1 sets new standards.

Is Open AI the future?

As the examples of Llama and DeepSeek show, open models are rapidly gaining popularity due to their flexibility. However, DeepSeek’s success is likely to be short-lived: competitors are already exploring its approaches.

At the same time, DeepSeek’s success is likely to be short-lived: competitors are already exploring its approaches.

The researchers believe that end users and startups are the primary beneficiaries of such developments, as the cost of working with AI is rapidly approaching zero.

Global Context: China vs. the U.S.

DeepSeek’s work underscores the difference in approach. While OpenAI is investing billions of dollars in infrastructure, Chinese companies are demonstrating how to achieve comparable results at a lower cost.

According to analysts, this could change investment strategies in the industry: the need for high infrastructure spending is becoming less and less obvious.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

You may also like