Meta sued by major publishers over copyright
According to The New York Times, five publishers—Macmillan, McGraw Hill, Elsevier, Hachette, and Cengage—along with author Scott Turow have jointly sued Meta, accusing it of committing "one of the most egregious copyright infringements in history." The complaint alleges that Meta repeatedly copied books and journals from these institutions without authorization and deliberately sourced content from piracy sites such as LibGen, Anna's Archive, and Sci-Hub to train its Llama models. It also claims that the Common Crawl dataset was allegedly saturated with numerous unauthorized copyrighted works. As an example, the complaint states that after inputting two short passages from Cengage's textbook *Calculus: Early Transcendentals* into Llama, the model verbatim reproduced subsequent portions of the original text. Previously, authors had already filed lawsuits against Meta, revealing internal discussions at the company regarding how to address media inquiries about using known pirated datasets. Although a federal judge ruled in favor of Meta last year, he explicitly clarified that this ruling does not mean AI training on copyrighted materials is legal. Earlier, Anthropic agreed to pay $1.5 billion to settle similar infringement allegations. A spokesperson for Meta, Dave Arnold, responded by stating, "The court has determined that AI training can constitute fair use, and we will vigorously defend ourselves."
