Get all your news in one place.
100’s of premium titles.
One app.
Start reading
TechRadar
TechRadar
Ellen Jennings-Trace

Meta admits it scraped all Australian Facebook posts since 2007 to train its AI

In this photo illustration, the Meta Platforms, Inc. logo is displayed on a smartphone screen.

Meta has admitted it used Facebook and Instagram publicposts for Australian users to train its Artificial Intelligence models, and has scraped information from as far back as 2007.

An Australian Parliamentary committee has heard that whilst European users can opt out thanks to GDPR laws, Australian customers are not given that choice.

Meta has denied using the information of anyone under 18, but did confirm it had used over a decade’s worth of data. The firm could not answer whether it has scraped the photos of children who are now adults (i.e. those who created their accounts as a child, but have since turned 18).

A turning tide

The process of ‘scraping’ is essential for the development of AI and is basically data harvesting from websites, extracting the information and feeding it back to a Large Language Models (LLMs) which learns from the data. This means that GDPR regulations are becoming troublesome for more and more LLMs such as ChatGPT, which collects data from all over the internet without consent from the original source.

Meta’s global privacy director Melinda Claybaugh sat before the inquiry and admitted that the company was forced to pause the launch of AI products in Europe due to a lack of certainty, and it has had to give European users an opt-out due to more robust privacy laws. Senator Shoebridge grilled the Meta representative,

“The truth of the matter is that, unless you consciously had set those posts to private, since 2007, Meta has just decided you will scrape all of the photos and all of the text from every public post on Instagram or Facebook that Australians have shared since 2007, unless there was a conscious decision to set them on private. But that’s actually the reality, isn’t it?”

Claybaugh replied, “Correct”. She added that users can set their posts to private now to prevent future scraping, but this would have no effect on the data already taken.

The realization seems to be creeping in for the public and for tech companies that training AI models requires such vast amounts of data that it is ‘impossible’ to do so without using copyrighted materials. Considering millions of user's posts have been used without their consent, it looks like tech giants might face much stricter regulations in future.

Via The Guardian

More from TechRadar Pro

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.