Mark Zuckerberg's Meta and Sam Altman-led OpenAI have a new headache on their hands this week—lawsuits.
A handful of writers—including comedian Sarah Silverman—have brought class-action complaints against the two companies for "remixing the copywrited works of thousands of book authors—and many others—without consent, compensation, or credit."
The plaintiffs, who include authors Paul Tremblay, Mona Awad, Christopher Golden and more, are represented by Joseph Saveri and Matthew Butterick who say they are standing up on behalf of authors to continue a "vital conversation about how A.I. will coexist with human culture and creativity."
In the suit seen by Fortune, exhibits show a prompt given to a large language model (like OpenAI's ChatGPT) asking it to explain the plot of Tremblay's book The Cabin at the End of the World.
The model gives an initial 420-word answer of the "early" parts of the novel. Three further prompts asking for the next part of the book and its ending result in answers totaling more than 1,100 words long—including revealing the twist at the end of the novel.
Tests of Bunny and 13 Ways of Looking at a Fat Girl by Awad had similar results, the evidence alleges.
Given the extensive answers given the plaintiffs allege that their work has been fed to the large language models as training data for reproduction "without consent, without credit, and without compensation."
Meta and OpenAI did not immediately respond when contacted by Fortune for comment.
Why books?
Silverman's suit filed on Friday—alongside authors Chris Golden, and Richard Kadrey—comes after an initial filing on June 28 from a raft of other authors.
Lawyers for the plaintiffs claim that authors have continually been contacting them since March 2023 when OpenAI's ChatGPT were first made available to the public.
And books, the case alleges, are a particularly valuable training set to LLMs.
The lawyers cite a study from MIT, Cornell University and Google Research published in May which found the best domains used training data from "high-quality" sources like books as well as data from the web.
Although books were found to contain large quantities of toxic material, the study also found that as a dataset it provided the "longest, most readable" high-quality text.
The suit alleges that this valuable data has been obtained from a "flagrantly illegal" source called a "shadow library." These are online databases which aggregate books and articles, thus bypassing obstacles like paywalls and payments to download copies.
First lawsuit of many?
The latest lawsuit is going to be one of many, experts are predicting. Daniel Gervais, a law professor at Vanderbilt University, told Insider last week that a deluge of lawsuits from creators is imminent.
"This one [the author lawsuit] is really about the input," Gervais said, speaking on the lawsuit's allegations around A.I. data-scraping and training. "The output wave is coming as well."
Indignation in the literature industry follows tumult in the entertainment sector, with Hollywood writers currently on strike over fears A.I. will undermine their profession by mimicking existing scripts.
It's an issue that Sam Altman, at least, is aware needs to be addressed.
Speaking to a Senate subcommittee hearing on May 16, Altman said: “We think that creators deserve control over how their creations are used, and what happens sort of beyond the point of them releasing it into the world.”
“We need to figure out new ways with this new technology that creators can win, succeed and have a vibrant life, and I’m optimistic that this will present it.”
The authors lawsuit seeks unspecified monetary damages—one of the avenues Altman says he has been discussing with visual artists and musicians in order "to figure out what people want.” “There’s a lot of different opinions, unfortunately,” he added.