Adobe Faces Class-Action Lawsuit Over Alleged Use of Pirated Books to Train AI Model
Like many tech companies, Adobe has aggressively pursued AI innovation since 2023, launching tools such as Firefly, its AI-powered media generation suite. Now, the company faces a proposed class-action lawsuit alleging it used unauthorized and pirated books to train one of its AI models. The lawsuit, filed on behalf of Elizabeth Lyon, an Oregon-based author of non-fiction guidebooks, claims Adobe used copyrighted works—including Lyon’s own books—without permission to train its SlimLM language model. SlimLM is described by Adobe as a lightweight language model optimized for document assistance tasks on mobile devices. According to the company, SlimLM was pre-trained on SlimPajama-627B, an open-source dataset released by Cerebras in June 2023 that combines multiple data sources and claims to be deduplicated. Lyon’s legal complaint argues that SlimPajama was derived from the RedPajama dataset, which itself includes a subset known as Books3—a collection of 191,000 books used to train various generative AI systems. The lawsuit asserts that because SlimPajama is a manipulated version of RedPajama, it inherently contains copyrighted works from the Books3 dataset, including Lyon’s writings. The complaint states that Adobe’s use of this dataset constitutes copyright infringement, as it incorporated protected material without consent, credit, or compensation. This case adds to a growing wave of legal challenges targeting tech firms for training AI models on datasets that allegedly include pirated or improperly sourced content. The Books3 dataset has already been central to multiple lawsuits. In September, a lawsuit against Apple accused the company of using copyrighted material from RedPajama to train Apple Intelligence. Similarly, a separate suit filed in October targeted Salesforce, alleging it used the same dataset for AI training. These cases reflect a broader legal trend: as AI systems become more prevalent, courts are increasingly scrutinizing how training data is sourced. The situation reached a notable milestone in September when Anthropic agreed to a $1.5 billion settlement with a group of authors who accused the company of using pirated books to train its AI assistant, Claude. That settlement was seen as a potential turning point in the ongoing legal battles over intellectual property in AI development. With Adobe now facing similar allegations, the case underscores the mounting pressure on tech companies to ensure transparency and legality in how they gather and use data for AI training. As the legal landscape evolves, the outcomes of these lawsuits could reshape industry practices and set new standards for ethical AI development.
