Digital Event Horizon
Cloudflare has made a startling allegation against Perplexity, an AI site accused of using "stealth tactics" to evade websites' no-crawl directives. The allegations claim that Perplexity continued to access sites despite being informed by customers who had disallowed its scraping bots. If true, this evasion would flout Internet norms that have been in place for more than three decades.
Cloudflare alleges Perplexity uses "stealth tactics" to evade websites' no-crawl directives. Perplexity continued to access sites despite being informed of no-crawl directives through robots.txt files and Web application firewalls. The technique used by Perplexity involves utilizing multiple IPs, rotating through them in response to restrictive policies. The evasion would flout Internet norms that have been widely observed since 1994. Perplexity has been accused of plagiarizing content and manipulating crawling bots' ID strings to bypass website blocks.
Cloudflare, a leading network security and optimization service, has made a startling allegation against Perplexity, an AI site that has been at the center of controversy in recent times. In a blog post, Cloudflare researchers have claimed that Perplexity is using "stealth tactics" to evade websites' no-crawl directives, thereby violating Internet norms that have been in place for more than three decades.
According to Cloudflare's findings, despite being informed by customers who had disallowed Perplexity scraping bots by implementing settings in their sites' robots.txt files and through Web application firewalls that blocked the declared Perplexity crawlers, the AI site continued to access the sites' content. The researchers then set out to test Perplexity for themselves and found that when known Perplexity crawlers encountered blocks from robots.txt files or firewall rules, Perplexity then searched the sites using a stealth bot that followed a range of tactics to mask its activity.
This technique was observed across tens of thousands of domains and millions of requests per day. The researchers provided a diagram to illustrate the flow of the technique they allege Perplexity used, which involved utilizing multiple IPs not listed in Perplexity's official IP range, rotating through these IPs in response to restrictive robots.txt policies and blocks from Cloudflare.
If true, this evasion would flout Internet norms that have been widely observed and endorsed since 1994. Engineer Martijn Koster proposed the Robots Exclusion Protocol, which provided a machine-readable format for informing crawlers they weren’t permitted on a given site. Sites installed the simple robots.txt file at the top of their homepage to indicate their disapproval. The standard formally became a standard under the Internet Engineering Task Force in 2022.
Perplexity's actions have been met with skepticism and outrage from several publishers who have accused the AI site of plagiarizing their content. Forbes, for instance, accused Perplexity of "cynicle theft" after publishing a post that was "extremely similar to Forbes' proprietary article" posted a day earlier. Ars Technica sister publication Wired has leveled similar claims, citing suspicious traffic patterns from IP addresses likely linked to Perplexity.
Perplexity was also found to have manipulated its crawling bots' ID string to bypass website blocks. Despite these allegations and the findings of Cloudflare's research, Perplexity representatives did not respond to an email asking if the allegations were true.
The implications of this controversy are far-reaching, with Cloudflare's researchers stressing that there are clear preferences for crawlers to be transparent, serve a clear purpose, perform a specific activity, and most importantly, follow website directives and preferences. The researchers stated that based on Perplexity's observed behavior, which is incompatible with these preferences, they have de-listed the AI site as a verified bot and added heuristics to their managed rules that block this stealth crawling.
This incident serves as a stark reminder of the importance of adhering to Internet norms and respecting website directives. It also highlights the need for greater accountability from AI sites and the developers who create them. As the use of AI continues to grow, it is essential that we prioritize transparency, ethics, and respect for intellectual property rights.
Related Information:
https://www.digitaleventhorizon.com/articles/The-Stealth-Tactics-of-Perplexity-How-AI-Sites-Evasion-Flouts-Internet-Norms-deh.shtml
https://arstechnica.com/information-technology/2025/08/ai-site-perplexity-uses-stealth-tactics-to-flout-no-crawl-edicts-cloudflare-says/
Published: Mon Aug 4 15:11:55 2025 by llama3.2 3B Q4_K_M