Get all your news in one place.

100's of premium titles.
One app.

Start reading

Get all your news in one place.

100's of premium titles. One news app.

Start reading

TechRadar

Eric Hal Schwartz

Mom's website ready to put OpenAI in a time-out after learning the AI firm may have scrapped its data

OpenAI Mumsnet Justine Roberts Creative Commons Lawsuit LLMs

British parenting hub Mumsnet has filed a lawsuit against OpenAI, claiming it violated copyright law by using its data to train its AI models, including those powering ChatGPT. It’s the first such legal action taken against OpenAI in the United Kingdom, but one of a growing number of similar cases spread internationally accusing OpenAI of illicitly scraping information for its models without permission. Mumsnet claims its forums host more than six billion words and that OpenAI employed those words to teach its AI models about parenting and related topics.

“Such scraping without permission is an explicit breach of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval,” Mumsnet co-founder Justine Roberts explained in a post on the website. “The LLMs are building models like ChatGPT to provide the answers to any and all prospective questions that will mean we’ll no longer need to go elsewhere for solutions. And they’re building those models with scraped content from the websites they are poised to replace.”

The legal complaint points to the timing of the data collection as another point of contention since it mainly happened before websites were paying close attention to whether AI companies were scraping their data. Mumsnet alleges that third-party research institutions initially performed the majority of this data scraping.

Roberts wrote that Mumsnet reached out to OpenAI about licensing its content, pointing out that the platform has a concentrated collection of writing by women that is unlike the majority of internet content. But, OpenAI turned them down, citing interest in “datasets that are not easily accessible online,” according to Roberts.

Scrape Scraps

Mumsnet is hardly alone in voicing complaints about OpenAI’s data scraping and is now part of an expanding cohort of companies taking OpenAI to court on the matter. For instance, the Authors Guild has sued OpenAI, alleging copyrighted books were used for training AI’s models, as have a group of academics claiming their articles were similarly lifted by OpenAI. Reuters and The New York Times have both sued OpenAI over not only data scraping but also by claiming ChatGPT generates responses with content far too close to their copyrighted articles. Even Creative Commons has filed suit against the AI developer, claiming that the company used Creative Commons-licensed content to train its AI models in ways that violated the terms of the licenses.

OpenAI has defended its practices as falling under the fair use doctrine. In the UK, the company responded to a House of Lords inquiry by acknowledging the necessity of using copyrighted materials for training its AI models and that it should do more to support content creators, but still maintains that what it does is legal. While this is OpenAI’s first UK case on the matter, Getty Images has a similar case going in the country’s courts against Stability AI for its image-generating AI.

The outcome of Mumsnet’s lawsuit and other cases may set precedents for how AI companies handle copyrighted content and might influence future regulations and licensing practices. The effort to balance AI innovation and intellectual property rights is far from settled and probably won’t be for a long while.

To be fair, Mumsnet isn’t against LLMs and AI as a concept. In fact, Mumsnet employed OpenAI’s models to build an AI chatbot called MumsGPT last year. MumsGPT was only available to executives at Mumsnet when it was announced and hasn’t been mentioned since, so it may not be around anymore, but the idea was to offer it as a research tool and even as something policymakers could use in developing parenting-related regulations. Roberts didn’t mention MumsGPT but made a point of saying that there are positive potential uses for AI in her explanation of the lawsuit.

“But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet they risk destroying them,” Roberts wrote. “We know that taking on a multinational giant like OpenAI, with its $3bn of revenues, is not an easy task in the face of the huge resources they’ll throw at us but this is too important an issue to simply roll over. Not just for Mumsnet but for every website you’ve ever landed on for news, advice or simply to ask if you’re being unreasonable.”