Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Hardware
Tom’s Hardware
Technology
Anton Shilov

Microsoft and OpenAI investigate whether DeepSeek illicitly obtained data from ChatGPT

The OpenAI logo is being displayed on a smartphone, with the Microsoft logo visible on the screen in the background, in this photo illustration taken in Brussels, Belgium.

Microsoft and OpenAI are probing whether a group linked to the Chinese AI startup DeepSeek accessed OpenAI's data using the company's application programming interface without authorization, reports Bloomberg, citing its sources familiar with the matter. A Financial Times source at OpenAI said that the company had evidence of data theft by the group. Meanwhile, U.S. officials suspect DeepSeek trained its model using OpenAI's outputs, a method known as distillation. 

Microsoft's security team observed a group believed to have ties to DeepSeek extracting a large volume of data from OpenAI's API. The API allows developers to integrate OpenAI's proprietary models into their applications for a fee and retrieve some data. However, the excessive data retrieval noticed by Microsoft researchers violates OpenAI's terms and conditions and signals an attempt to bypass OpenAI's restrictions. 

The probe comes after DeepSeek launched its R1 AI model. The company claims R1 matches or exceeds leading models in areas like reasoning, math, and general knowledge while consuming considerably fewer resources. Following DeepSeek’s announcement, Alphabet, Microsoft, Nvidia, and Oracle experienced a collective market loss of nearly $1 trillion. Investors reacted to concerns that DeepSeek's advancements could threaten the dominance of U.S. firms in the AI sector. However, if it turns out that DeepSeek used data illicitly obtained data from others, this will explain how the company managed to achieve its results without investing billions of dollars. 

David Sacks, the U.S. government's AI advisor, stated there was strong evidence that DeepSeek used OpenAI-generated content to train its model through a process called distillation. This method allows one AI system to learn from another by analyzing its outputs. Sacks did not provide specific details on the evidence, though.

Neither OpenAI nor Microsoft provided an official statement on the investigation. DeepSeek and High-Flyer, the hedge fund that helped launch the company, did not respond to Bloomberg's requests for comment. However, in a statement published by Bloomberg and the Financial Times, Open AI acknowledged that China-based companies tend to distill models from American companies and that it does its best to protect its models. 

"We know PRC based companies — and others — are constantly trying to distill the models of leading US AI companies," a statement by Open AI reads. "As the leading builder of AI, we engage in countermeasures to protect our IP, including a careful process for which frontier capabilities to include in released models, and believe as we go forward that it is critically important that we are working closely with the U.S. government to best protect the most capable models from efforts by adversaries and competitors to take U.S. technology.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.