The robots.txt protocol is widely complied with by bot operators. where eBay attempted to block a bot that did not comply with robots.txt, and in May 2000 a court ordered the company operating the bot to stop crawling eBay's servers using any automatic means, by
legal injunction on the basis of
trespassing. In 2007,
Healthcare Advocates v. Harding, a company was sued for accessing protected web pages archived via
The Wayback Machine, despite robots.txt rules that prevented those pages from being archived. A
Pennsylvania court ruled "in this situation, the robots.txt file qualifies as a technological measure" under the
DMCA. Due to a malfunction at Internet Archive, Harding could temporarily access these pages from the archive and thus the court found "the Harding firm did not circumvent the protective measure". In 2013
Associated Press v. Meltwater U.S. Holdings, Inc. the Associated Press sued Meltwater for
copyright infringement and misappropriation over copying of AP news items. Meltwater claimed that they did not require a license and that it was
fair use, because the content was freely available and not protected by robots.txt. The court decided in March 2013 that "Meltwater’s copying is not protected by the fair use doctrine", mentioning among several factors that "failure […] to employ the robots.txt protocol did not give Meltwater […] license to copy and publish AP content".
Search engines Some major
search engines following this standard include
Ask,
AOL,
Baidu,
Bing,
DuckDuckGo,
Kagi,
Google,
Yahoo!, and
Yandex.
Archival sites Some web archiving projects ignore robots.txt.
Archive Team uses the file to discover more links, such as
sitemaps. Co-founder
Jason Scott said that "unchecked, and left alone, the robots.txt file ensures no mirroring or reference for items that may have general use and meaning beyond the website's context." In 2017, the
Internet Archive announced that it would stop complying with robots.txt directives.
Artificial intelligence Starting in the 2020s, web operators began using robots.txt to deny access to bots collecting training data for
generative AI. In 2023, Originality.AI found that 306 of the thousand most-visited websites blocked
OpenAI's GPTBot in their robots.txt file and 85 blocked
Google's Google-Extended. Many robots.txt files named GPTBot as the only bot explicitly disallowed on all pages. Denying access to GPTBot was common among news websites such as the
BBC and
The New York Times. In 2023, blog host
Medium announced it would deny access to all artificial intelligence web crawlers as "AI companies have leached value from writers in order to spam Internet readers". In 2025, the nonprofit
RSL Collective announced the launch of the
Really Simple Licensing (RSL) open content licensing standard, allowing web publishers to set terms for AI bots in their robots.txt files. Participating companies at launch included Medium,
Reddit, and
Yahoo. ==Security==