Adobe’s growing investment in artificial intelligence has landed the company in legal trouble, as a new lawsuit alleges it used pirated books to train one of its AI models.
A proposed class-action lawsuit, filed on behalf of Elizabeth Lyon, an author from Oregon, claims Adobe trained its SlimLM language model using unauthorized copies of copyrighted books, including Lyon’s own work. The lawsuit was first reported by Reuters.
Adobe describes SlimLM as a small language model designed for document assistance tasks on mobile devices. According to the company, the model was pre-trained using SlimPajama-627B, a deduplicated, multi-source open dataset released by Cerebras in June 2023. However, Lyon alleges that her books were included in the dataset used to train SlimLM without her consent.
The lawsuit claims SlimPajama is a derivative of the RedPajama dataset, which itself allegedly contains Books3 — a controversial collection of approximately 191,000 pirated books that has been widely used to train generative AI systems. Because SlimPajama is based on RedPajama, the suit argues, it therefore includes copyrighted works belonging to Lyon and other authors.
Books3 has become a recurring focus in lawsuits against major tech companies. In September, authors sued Apple, alleging it used copyrighted material to train its Apple Intelligence models without permission or compensation. A similar lawsuit filed in October accused Salesforce of relying on RedPajama for AI training.
Legal challenges over AI training data have become increasingly common as companies rely on massive datasets to develop generative models. In one of the most significant cases to date, Anthropic agreed in September to pay $1.5 billion to authors who accused the company of using pirated books to train its Claude chatbot. The settlement was seen as a potential turning point in the ongoing debate over copyright protections in AI development.





