Europe’s privacy regulators have issued new guidelines for judging whether AI companies are breaking the EU’s General Data Protection Regulation, which threatens fines of up to 4% of global revenues.
The guidelines may provide enough clarity for companies such as Meta—under fire from privacy campaigners and regulators over its training of AI models on people’s personal data—to roll out new AI services in Europe.
But they also make clear that these companies have high hurdles to jump if they want to stay on the right side of European privacy law. It may be the case that AI companies will struggle to comply if they trained their models on personal data that had been illegally collected, and if they don’t take steps to ensure that information cannot be accessed by others when people use the models.
The European Data Protection Board (EDPB), the umbrella organization for the EU’s privacy watchdogs, issued the formal opinion on AI on Wednesday, in response to a request from Ireland’s data protection commissioner.
The Irish watchdog requested the opinion in September, shortly after it had convinced Elon Musk’s X to stop training its Grok AI on some public posts from European users. X hadn’t asked for the users’ consent and had no legal basis under the GDPR for using their data in this way. Commissioner Dale Sunderland said at the time that the EU’s data protection authorities should agree on key questions about AI and the GDPR, to “enable proactive, effective, and consistent Europe-wide regulation of this area.”
The GDPR requires anyone processing someone’s personal data (meaning any data that can be linked to them as an identifiable person) have a legal basis for doing so. The law provides a list of possible legal bases, such as user consent or carrying out contracts, and the big AI companies have generally opted to claim that they have a “legitimate interest” in processing people’s personal data for training their models.
This is the vaguest legal basis allowable, and reflects the fact that none of the other options applies. But it also highlights the need for further clarity about exactly what is permissible.
The ‘legitimate interests’ test
On the one hand, the EDPB’s Wednesday opinion confirms that this is a potentially valid legal basis for training AI models.
“We welcome the EDPB’s recognition that legitimate interests is an appropriate legal basis for training generative AI models,” said a Meta spokesperson. “But it’s frustrating that it has taken many months of unnecessary delay for regulators to approve the same legal mechanisms that the industry proposed at the start of this year … We urge European regulators to quickly apply these principles in a pragmatic and transparent way so the EU can deliver the growth that leading European political figures are calling for.”
The Computer and Communications Industry Association (CCIA)—a group whose membership includes big AI players like Meta, X, and Google—said the confirmation of legitimate interest as a lawful basis marked “an important step towards more legal certainty.”
However, the opinion stresses that successfully claiming “legitimate interest” will mean passing a three-step test. An AI company would have to have a “clear and precisely articulated” reason for processing someone’s data—it can’t rely on hypothetical future reasons—and the processing would have to be “really necessary” for achieving that aim. The legitimate interest can also be overridden by fundamental rights such as privacy and freedom of expression, depending on the case.
Privacy advocates have argued that AI models such as OpenAI’s GPT series are too general to pass this test, as they were developed without a specific use case in mind, and people can come up with entirely new applications long after the model was trained and released.
In their opinion, the regulators gave two examples of what could pass the test: training a “conversational agent to assist users,” and deploying an AI-based cybersecurity tool in a network. They stressed that legitimate interest had to be judged on a case-by-case basis.
“A large part of the opinion is telling AI companies what they should have been doing anyway, but it also sends a message that being innovative is not an excuse for not complying with the law,” said Tom West, a legal officer at the rights group Privacy International.
The EDPB noted that it wasn’t able to cover every possible AI scenario in its opinion, not least because the models are evolving so quickly. But that vagueness has left some dissatisfied. Claudia Canelles Quaroni, the CCIA’s European senior policy manager, suggested that AI companies still wouldn’t have the confidence they need to fully roll out their services in Europe.
“Greater legal clarity and a practical framework are needed to reconcile EU privacy principles with technological progress,” she said in an emailed statement. “This is essential for Europe to remain competitive and unlock AI-driven innovation. Otherwise European consumers and businesses risk missing out on more cutting-edge technologies powered by AI and data.”
Where the data comes from
Most of today’s cutting-edge foundation models were likely trained on data that was scraped from public sources on the web—the companies are very opaque on this point—so this was a major focus in the EDPB opinion.
The watchdogs suggested that it might be possible for someone to legally scrape people’s data from the web if they were very careful about minimizing the data they collected, and if they made people aware that they were taking their data. But without those mitigations—which are unlikely to have been in place—the collection may very well have broken people’s privacy rights, and possibly even their freedom of expression if a “sense of surveillance” makes them likely to self-censor.
Crucially, using data with dodgy provenance could undermine an AI company’s claim to be exercising its legitimate interests when training its models on that data, potentially leaving it open to a big GDPR fine. And if another company then goes and deploys those models, then it could also be in trouble—the onus is on that company to check whether the model it’s deploying was trained on lawfully processed data.
That picture could change if the model is made “anonymous,” meaning there’s no significant likelihood that someone could extract people’s personal data from it. But the opinion makes clear that companies are not to be taken at their word on this. To determine whether a model is “anonymous,” regulators can go so far as to conduct “analysis of reports of code reviews, as well as a theoretical analysis documenting the appropriateness of the measures chosen to reduce the likelihood of reidentification of the concerned model.”
In any case, the big AI companies don’t tend to go so far as to claim that they have made their models anonymous, though they do sometimes claim to protect people’s privacy.
OpenAI, for example, insists that it doesn’t “actively seek out personal information” to train its models, and doesn’t use personal information in training data to build profiles of people. It also filters out data from training datasets that comes from “websites that aggregate large volumes of personal information,” and says it trains its models to “reject requests for private or sensitive information about people.” The company lets Europeans opt out of having their personal data used in model training as well.
Privacy International’s West pointed out that it is virtually impossible to remove someone’s personal information from a model that has been trained on it. He also noted that the EDPB opinion says watchdogs can go so far as to order a model’s deletion if it can’t be brought in line with the GDPR. “We support keeping that option on the table,” he said.
OpenAI had not responded to a request for comment on the EDPB opinion at the time of writing; nor had Microsoft or Google.