A filing in a lawsuit against two artificial intelligence (AI) image generation companies, Stability AI, who develop the Stable Diffusion tool, and Midjourney, who host a Discord-based image generation tool, includes a list of around 4,700 names of artists and art styles which were included in the training corpus for Midjourney.
Included in the list are illustrators like Sarah Andersen, the creator of the popular Oddball webcomic, Karla Ortiz, a concept artist and illustrator, and Gerald Brom, a novelist and illustrator, who all signed onto an amended complaint against the defendants at the end of November.
The lawsuit, which was first filed in the beginning of 2023, is a class action suit that claims that AI image generators “are 21st century collage tools that violate the rights of millions of artists.”
They allege that Stability AI and other companies downloaded billions of copyrighted images, including those of the plaintiffs, to train their models, and that they did it “without the consent of the artists and without compensating any of those artists.”
According to the amended complaint, Midjourney’s CEO David Holz posted a link to a Google sheet that included a tab on the spreadsheet called “Artists” on Midjourney’s Discord server. That sheet is now private.
“i think you’re all gonna get [your] mind blown by this style feature … we were very liberal in building out the dictionary … it has cores and punks and artist names … as much as we could dump in there … i should be clear it’s not just genres its also artist names … it’s mostly artist names … 4000 artist names,” Holz wrote in the server according to the amended complaint.
“here is our style list,” Holz said in another message according to the lawsuit.
The complaint includes detailed technical explanations of how the image generation tool works, as well as a description of how Stable Diffusion obtained the huge corpus of images that it trained its models on.
One key dataset was LAION-5B, a huge, 5.85 billion image model developed by LAION (Large-Scale Artificial Intelligence Open Network), a German nonprofit that compiles open-source training datasets for machine learning.
According to a Stanford study first reported by 404Media in mid-December, LAION-5B includes thousands of instances of “suspected child sexual abuse materials” in their datasets, leading LAION to take the datasets down.
Included in exhibit B to the complaint was a list of some of the training images from the artists who say their images were obtained without their consent.
The list of artists went viral on X in recent days, with artists and their supporters discussing the filing.
“Midjourney developers caught discussing laundering, and creating a database of Artists (who have been dehumanized to styles) to train Midjourney off of,” wrote the artist Jon Lam. “This has been submitted into evidence for the lawsuit. Prompt engineers, your ‘skills’ are not yours.”
“Midjourney just dropped a list in court of all of the artists it admits to having scraped to train its AI engine. Are you on it?” posted @IanColdwater.
“no but a few dozen of my magic the gathering artist friends are on there,” replied @CubicleApril.
“Do they have good lawyers?” said @IanColdwater. “Maybe point them to this and gently suggest they get them if they don’t?”