MarketArchive.today
Company Profile

Archive.today

archive.today is a web archiving website that saves snapshots on demand. It has support for JavaScript-heavy sites such as Google Maps and X. archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a screenshot of the page.

History
archive.today was founded in 2012 as a web archive. It allegedly registered its trademark in the Czech Republic in 2013. The site originally branded itself as archive.today, but changed the primary mirror to archive.is in May 2015. It began to deprecate the archive.is domain in favor of other mirrors in January 2019. According to the archive.today blog, the website had saved about 500 million pages by 2021, 700 terabytes in total size. In July 2013, archive.today began supporting the API of the Memento Project at Los Alamos National Laboratory. Due to budget constraints at LANL, the Memento Project was disestablished in September 2025. Archive.today was one of the last major active users of the Memento protocol following the project's downsizing. Individual profile pages on 200.zona.media link to snapshots of social media posts by relatives, obituaries in local media, and other open-source evidence used to verify each death. In early 2023, a team of researchers at the University of Amsterdam identified archive.today as the most-used open-access archiving service among fact-checking organisations, based on the 's dataset on the Russo-Ukrainian war. In August 2023, the Wikitravel Press co-founder and Google Cloud executive Jani Patokallio (the eldest son of a Finnish diplomat and renowned writer Pasi Patokallio) published an investigation on his blog Gyrovague regarding archive.today's funding sources and the founder's identity. On 30 October 2025, the US Federal Bureau of Investigation (FBI) subpoenaed archive.today's domain registrar, Tucows. The subpoena stated its purpose was to identify the owner(s) of the archive.today domain name, and that it was part of a criminal investigation conducted by the FBI, the nature of which was not disclosed. The Catalan daily Ara interpreted the action as part of a campaign to selectively criminalize anonymous digital archives reliant on micro-donations (such as Anna's Archive, eliminated by Google from its search results), even though industrial datasets used for training large language models (such as the Common Crawl, financed by OpenAI and Anthropic) also fail to compensate content creators and owners. On 8 January 2026, Patokallio's hosting provider Automattic notified him that it had received a GDPR complaint from a person identifying herself as "Nora". The complaint alleged that the 2023 Gyrovague investigation "contains extensive personal data… presented in a narrative that is defamatory in tone and context." After Patokallio submitted a rebuttal, Automattic sided with him and left the post up. Subsequent investigation suggested that "Nora" was likely an appropriated identity — the name belonged to either a real person or a trademark of a clothing brand, whose only connection to archive.today had been a prior content takedown request. On 20 February 2026, the English Wikipedia banned links to archive.today, citing the DDoS attack and evidence that archived content was tampered with to insert Patokallio's name. The decision was made despite concerns over maintaining content verifiability The Wikimedia Foundation had stated its readiness to take action regardless of the community verdict. The alterations were subsequently reverted. The discovery was cited as a key factor in the blacklisting decision, as it undermined the premise that archived snapshots were faithful reproductions of the original pages. This was not the first time Wikipedia had restricted links to archive.today. In 2013, the community blacklisted archive.is, citing concerns about botnets, linkspamming, and the opaque manner in which the site was operated. The decision was overturned in 2016 following a new request for comment, and archive.today was removed from the spam blacklist. At the time of the 2026 ban, the site was the second-largest archiving service used across all Wikimedia Foundation projects, with over 695,000 links spread across approximately 400,000 pages. == Funding ==
Funding
The site's funding model has been a persistent source of uncertainty. According to the creator, as of 2021 advertising and donations together covered less than 20% of operating expenses, with donations amounting to approximately €6,000. though this claim has not been independently verified by any secondary source == Features ==
Features
Archiving archive.today can capture individual pages in response to explicit user requests. Since its beginning, it has supported crawling pages with URLs containing the now-deprecated hash-bang fragment (). The website records only text and images, excluding XML, RTF, spreadsheet (xls or ods) and other non-static content. However, videos for certain sites, like Twitter, are saved. It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page. Once a web page is archived, it cannot be deleted directly by any Internet user. Users can download archived pages as a ZIP file, except pages archived when archive.today changed their browser engine from PhantomJS to Chromium (non-headless). archive.today does not obey robots.txt because it acts "as a direct agent of the human user." HTML class names are preserved inside the old-class attribute. When text is selected, a JavaScript applet generates a URL fragment seen in the browser's address bar that automatically highlights that portion of the text when visited again. Web pages can be duplicated from archive.today to web.archive.org as second-level backup, but archive.today does not save its snapshots in WARC format. The reverse—from web.archive.org to archive.today—is also possible, but the copy usually takes more time than a direct capture. While saving a page, a list of URLs for individual page elements and their content sizes, HTTP statuses and MIME types is shown. This list can only be viewed during the crawling process. Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog. According to the site's FAQ, archive.today's storage layer runs on Apache Hadoop and Apache Accumulo, with all data stored on the Hadoop Distributed File System (HDFS). Textual content is replicated three times across servers in two data centers, both located in Europe, with at least one hosted by the French provider OVH; images are replicated twice. While saving a dynamic list, archive.today search box shows only a result that links the previous and the following section of the list (e.g. 20 links for page). The other web pages saved are filtered, and sometimes may be found by one of their occurrences. Bypassing paywalls archive.today is frequently used to bypass paywalls on news websites, similarly to the defunct service 12ft. Legal and ethical debate The practice of sharing archive.today links to circumvent paywalls has sparked legal and ethical debate in Europe. In the Netherlands, journalist Peter Aanzee publicly challenged a physician who shared an archive.ph link to one of his paywalled articles in De Volkskrant, arguing that distributing archived copies constituted copyright infringement. The discussion drew on European Court of Justice jurisprudence on hyperlinking, particularly the 2016 GS Media v Sanoma ruling, which established that linking to illegally published content can constitute a copyright violation if the linker knew or ought to have known of the illegality — a presumption that applies automatically to parties acting for profit. Steiger argued that this reveals which paywalled content in Western media is most popular among archive.today users, and that even readers who pay for subscriptions to Swiss news sites such as Blick, 20 Minuten, and Neue Zürcher Zeitung have their data transmitted to Russian services like Yandex and Rutarget (owned by Sberbank, which is on the US sanctions list). In one documented case, ChatGPT retrieved a full article from The Economist via archive.today and then generated a five-point economic analysis in the publication's characteristic style and terminology. Van Ess identified six distinct methods of paywall circumvention by AI systems, of which "archive exploitation" — finding archived copies on services such as archive.today and the Internet Archive — was the most direct. Unlike documented concerns about AI training unpaywalled content, this behaviour involves real-time retrieval through archived copies during individual queries, effectively extending paywall circumvention beyond human users to automated agents. == Worldwide availability ==
Worldwide availability
Australia and New Zealand In March 2019, the site was blocked for six months by several internet providers in Australia and New Zealand in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack. China According to GreatFire.org, archive.today has been blocked in mainland China archive.li archive.fo as well as archive.ph Finland On 21 July 2015, the operators blocked access to the service from all Finnish IP addresses, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government. Russia In 2016, the Russian communications agency Roskomnadzor began blocking access to archive.is from Russia. == See also ==
tickerdossier.comtickerdossier.substack.com