Methods that prevent web pages from being indexed by traditional search engines may be categorized as one or more of the following: •
Contextual web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence). •
Dynamic content:
dynamic pages, which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without
domain knowledge. •
Limited access content: sites that limit access to their pages in a technical manner (e.g., using the
Robots Exclusion Standard or
CAPTCHAs, or no-store directive, which prohibit search engines from browsing them and creating
cached copies). Sites may feature an internal search engine for exploring such pages. •
Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific
file formats not recognised by search engines. •
Private web: sites that require registration and login (password-protected resources). •
Scripted content: pages that are accessible only by links produced by
JavaScript as well as content dynamically downloaded from Web servers via
Flash or
Ajax solutions. •
Software: certain content is hidden intentionally from the regular Internet, accessible only with special software, such as
Tor,
I2P, or other darknet software. For example, Tor allows users to access websites using the
.onion server address anonymously, hiding their IP address. •
Unlinked content: pages which are not linked to by other pages, which may prevent
web crawling programs from accessing the content. This content is referred to as pages without
backlinks (also known as inlinks). Also, search engines do not always detect all backlinks from searched web pages. •
Web archives: Web archival services such as the
Wayback Machine enable users to see archived versions of web pages across time, including websites that have become inaccessible and are not indexed by search engines such as Google. ==Content types==