Get all your news in one place.
100’s of premium titles.
One app.
Start reading
LiveScience
LiveScience
Ben Turner

Chinese researchers just built an open-source rival to ChatGPT in 2 months. Silicon Valley is freaked out.

The DeepSeek logo displayed on a smartphone screen.

China has released a cheap, open-source rival to OpenAI's ChatGPT, and it has some scientists excited and Silicon Valley worried.

DeepSeek, the Chinese artificial intelligence (AI) lab behind the innovation, unveiled its free large language model (LLM) DeepSeek-V3 in late December 2024 and claims it was built in two months for just $5.58 million — a fraction of the time and cost required by its Silicon Valley competitors.

Following hot on its heels is an even newer model called DeepSeek-R1, released Monday (Jan. 20). In third-party benchmark tests, DeepSeek-V3 matched the capabilities of OpenAI's GPT-4o and Anthropic's Claude Sonnet 3.5 while outperforming others, such as Meta's Llama 3.1 and Alibaba's Qwen2.5, in tasks that included problem-solving, coding and math.

Now, R1 has also surpassed ChatGPT's latest o1 model in many of the same tests. This impressive performance at a fraction of the cost of other models, its semi-open-source nature, and its training on significantly less graphics processing units (GPUs) has wowed AI experts and raised the specter of China's AI models surpassing their U.S. counterparts.

"We should take the developments out of China very, very seriously," Satya Nadella, the CEO of Microsoft, a strategic partner of OpenAI, said at the World Economic Forum in Davos, Switzerland, on Jan. 22..

Related: AI can now replicate itself — a milestone that has experts terrified

AI systems learn using training data taken from human input, which enables them to generate output based on the probabilities of different patterns cropping up in that training dataset.

For large language models, these data are text. For instance, OpenAI's GPT-3.5, which was released in 2023, was trained on roughly 570GB of text data from the repository Common Crawl — which amounts to roughly 300 billion words — taken from books, online articles, Wikipedia and other webpages.

Reasoning models, such as R1 and o1, are an upgraded version of standard LLMs that use a method called "chain of thought" to backtrack and reevaluate their logic, which enables them to tackle more complex tasks with greater accuracy.

This has made reasoning models popular among scientists and engineers who are looking to integrate AI into their work.

But unlike ChatGPT's o1, DeepSeek is an "open-weight" model that (although its training data remains proprietary) enables users to peer inside and modify its algorithm. Just as important is its reduced price for users — 27 times less than o1.

Besides its performance, the hype around DeepSeek comes from its cost efficiency; the model's shoestring budget is minuscule compared with the tens of millions to hundreds of millions that rival companies spent to train its competitors.

In addition, U.S. export controls, which limit Chinese companies' access to the best AI computing chips, forced R1's developers to build smarter, more energy-efficient algorithms to compensate for their lack of computing power. ChatGPT reportedly needed 10,000 Nvidia GPUs to process its training data, DeepSeek engineers say they achieved similar results with just 2,000.

How much this will translate into useful scientific and technical applications, or whether DeepSeek has simply trained its model to ace benchmark tests, remains to be seen. Scientists and AI investors are watching closely.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.