Archiving archive.today can capture individual pages in response to explicit user requests. Since its beginning, it has supported
crawling pages with
URLs containing the now-deprecated
hash-bang fragment (). The website records only text and images, excluding
XML,
RTF,
spreadsheet (
xls or
ods) and other
non-static content. However, videos for certain sites, like
Twitter, are saved. It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page. Once a web page is archived, it cannot be deleted directly by any Internet user. Users can download archived pages as a ZIP file, except pages archived when archive.today changed their browser engine from
PhantomJS to
Chromium (non-headless). archive.today does not obey
robots.txt because it acts "as a direct agent of the human user."
HTML class names are preserved inside the old-class
attribute. When
text is selected, a JavaScript applet generates a
URL fragment seen in the browser's
address bar that automatically highlights that portion of the text when visited again. Web pages can be
duplicated from archive.today to
web.archive.org as
second-level backup, but archive.today does not save its snapshots in
WARC format. The reverse—from web.archive.org to archive.today—is also possible, but the copy usually takes more time than a direct capture. While saving a page, a list of URLs for individual page elements and their content sizes,
HTTP statuses and
MIME types is shown. This list can only be viewed during the crawling process. Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog. According to the site's FAQ, archive.today's storage layer runs on
Apache Hadoop and
Apache Accumulo, with all data stored on the
Hadoop Distributed File System (HDFS). Textual content is
replicated three times across servers in two data centers, both located in Europe, with at least one hosted by the French provider
OVH; images are replicated twice. While saving a
dynamic list, archive.today search box shows only a result that links the previous and the following section of the list (e.g. 20 links for page). The other web pages saved are filtered, and sometimes may be found by one of their occurrences.
Bypassing paywalls archive.today is frequently used to
bypass paywalls on news websites, similarly to the defunct service
12ft.
Legal and ethical debate The practice of sharing archive.today links to circumvent paywalls has sparked legal and ethical debate in Europe. In the Netherlands, journalist Peter Aanzee publicly challenged a physician who shared an archive.ph link to one of his paywalled articles in
De Volkskrant, arguing that distributing archived copies constituted
copyright infringement. The discussion drew on
European Court of Justice jurisprudence on
hyperlinking, particularly the 2016
GS Media v Sanoma ruling, which established that linking to illegally published content can constitute a
copyright violation if the linker knew or ought to have known of the illegality — a presumption that applies automatically to parties acting for profit. Steiger argued that this reveals which paywalled content in Western media is most popular among archive.today users, and that even readers who pay for subscriptions to Swiss news sites such as
Blick,
20 Minuten, and
Neue Zürcher Zeitung have their data transmitted to Russian services like
Yandex and
Rutarget (owned by
Sberbank, which is on the US
sanctions list). In one documented case, ChatGPT retrieved a full article from
The Economist via archive.today and then generated a five-point economic analysis in the publication's characteristic style and terminology. Van Ess identified six distinct methods of paywall circumvention by AI systems, of which "archive exploitation" — finding archived copies on services such as archive.today and the Internet Archive — was the most direct. Unlike documented concerns about AI training unpaywalled content, this behaviour involves real-time retrieval through archived copies during individual queries, effectively extending paywall circumvention beyond human users to automated agents. == Worldwide availability ==