Reddit has filed a lawsuit against AI startup Perplexity and three data scraping companies, accusing them of bypassing protections to collect copyrighted content from the platform.

In the complaint, Reddit claims that these companies are operating on an industrial scale to illegally gather valuable user-generated data. The companies named in the lawsuit include SerpApi, Oxylabs, and AWMProxy. Reddit compares its actions to criminals who cannot break into a bank vault and instead target the armored truck carrying the money.

The lawsuit alleges that Perplexity relies on at least one of these scraping services to fuel its AI answer engine. Reddit argues that Perplexity chooses to gather data through questionable methods instead of negotiating a legal agreement like other AI companies have already done. In May 2024, Reddit sent Perplexity a cease-and-desist letter instructing them to stop scraping data from the platform. Although Perplexity responded by claiming that it did not use Reddit content to train its AI models and would respect robots.txt rules, Reddit says the problem only got worse afterward. The number of Reddit-based citations on Perplexity actually increased following the warning.

Reddit also says it tested Perplexity’s behavior by creating a post that could only be seen by Google. According to the lawsuit, within hours, Perplexity began producing answers that contained information from this hidden post. Reddit claims the only way this was possible was by scraping Google search results containing Reddit content and feeding that data directly into Perplexity’s AI engine.

READ
Spotify Adds Narrated Magazine Articles To Its Audio Platform

Reddit has made it clear that its data, consisting of millions of human discussions on countless topics, is extremely valuable for training artificial intelligence systems. The company has already signed agreements with OpenAI and Google, and it wants to ensure that it is paid fairly for access to its data. Reddit has taken similar action before, including a lawsuit against Anthropic, accusing the company of accessing Reddit content after promising not to.

Ben Lee, Reddit’s chief legal officer, says that AI companies are racing to collect high-quality human information, and this competition has helped create a growing black market for scraped data. He claims that scrapers are bypassing protections, hiding their identities, and stealing content to resell it to companies desperate for training material. He describes Oxylabs, AWMProxy, and SerpApi as examples of this illegal activity, accusing them of disguising their tools to steal Reddit content through Google search results. Lee also argues that Perplexity willingly purchases this scraped data instead of working with Reddit legally.

Perplexity responded by saying it has not yet received the lawsuit but intends to defend the rights of users to fairly access public information. Jesse Dwyer, the company’s head of communication, stated that Perplexity is committed to responsible AI development and will push back against threats to openness and the public interest.


Buy ExpressVPN with PayPal or Credit Card

Advertisement