URLs and URNs A
Uniform Resource Name (URN) is a URI that identifies a resource by name in a particular namespace. A URN may be used to talk about a resource without implying its location or how to access it. For example, in the
International Standard Book Number (ISBN) system,
ISBN 0-486-27557-4 identifies a specific edition of the
William Shakespeare play
Romeo and Juliet. The URN for that edition would be
urn:isbn:0-486-27557-4. However, it gives no information as to where to find a copy of that book. A
Uniform Resource Locator (URL) is a URI that specifies the means of acting upon or obtaining the representation of a resource, i.e. specifying both its primary access mechanism and network location. For example, the URL http://example.org/wiki/Main_Page refers to a resource identified as /wiki/Main_Page, whose representation is obtainable via the
Hypertext Transfer Protocol (
http:) from a network host whose
domain name is example.org. (In this case, HTTP usually implies it to be in the form of
HTML and related code. In practice, that is not necessarily the case, as HTTP allows specifying arbitrary formats in its header.) A URN is analogous to a person's name, while a URL is analogous to their street address. In other words, a URN identifies an item and a URL provides a method for finding it. Technical publications, especially standards produced by the IETF and by the W3C, normally reflect a view outlined in a
W3C Recommendation of 30 July 2001, which acknowledges the precedence of the term URI rather than endorsing any formal subdivision into URL and URN. As such, a URL is simply a URI that happens to point to a resource over a network. However, in non-technical contexts and in software for the World Wide Web, the term "URL" remains widely used. Additionally, the term "web address" (which has no formal definition) often occurs in non-technical publications as a synonym for a URI that uses the
http or
https schemes. Such assumptions can lead to confusion, for example, in the case of XML namespaces that have a
visual similarity to resolvable URIs. Specifications produced by the
WHATWG prefer
URL over
URI, and so newer HTML5 APIs use
URL over
URI. While most URI schemes were originally designed to be used with a particular
protocol, and often have the same name, they are semantically different from protocols. For example, the scheme
http is generally used for interacting with
web resources using HTTP, but the scheme
file has no protocol.
Syntax A URI has a scheme that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme. The URI generic syntax is a superset of the syntax of all URI schemes. It was first defined in , published in August 1998, and finalized in , published in January 2005. A URI is composed from an allowed set of
ASCII characters consisting of
reserved characters (gen-delims: :, /, ?, #, [, ], and @; sub-delims: !, $, &, ', (, ), *, +, ,, ;, and =), unreserved characters (
uppercase and lowercase letters,
decimal digits, -, ., _, and ~), and the character %. Syntax components and subcomponents are separated by
delimiters from the reserved characters (only from generic reserved characters for components) and define
identifying data represented as unreserved characters, reserved characters that do not act as delimiters in the component and subcomponent respectively, and
percent-encodings when the corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoding of an identifying data
octet is a sequence of three characters, consisting of the character % followed by the two hexadecimal digits representing that octet's numeric value. The URI generic syntax consists of five
components organized hierarchically in order of decreasing significance from left to right: URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment] A component is
undefined if it has an associated delimiter and the delimiter does not appear in the URI; the scheme and path components are always defined. A component is
empty if it has no characters; the scheme component is always non-empty. The authority component consists of
subcomponents: authority = [userinfo "@"] host [":" port] This is represented in a
syntax diagram as: The URI comprises: • A non-empty '''''' component followed by a colon (:), consisting of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (+), period (.), or hyphen (-). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. Examples of popular schemes include
http,
https,
ftp,
mailto,
file,
data and
irc. URI schemes should be registered with the
Internet Assigned Numbers Authority (IANA), although non-registered schemes are used in practice. • An optional '''''' component preceded by two slashes (//), comprising: • An optional '''''' subcomponent followed by an at symbol (@), that may consist of a
user name and an optional
password preceded by a colon (:). Use of the format username:password in the userinfo subcomponent is deprecated for security reasons. Applications should not render as clear text any data after the first colon (:) found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password). • A '''''' subcomponent, consisting of either a registered name (including but not limited to a
hostname) or an
IP address.
IPv4 addresses must be in
dot-decimal notation, and
IPv6 addresses must be enclosed in brackets ([]). • An optional '''''' subcomponent preceded by a colon (:), consisting of decimal digits. • A '''''' component, consisting of a sequence of path segments separated by a slash (/). A path is always defined for a URI, though the defined path may be empty (zero length). A segment may also be empty, resulting in two consecutive slashes (//) in the path component. A path component may resemble or map exactly to a
file system path but does not always imply a relation to one. If an authority component is defined, then the path component must either be empty or begin with a slash (/). If an authority component is undefined, then the path cannot begin with an empty segment—that is, with two slashes (//)—since the following characters would be interpreted as an authority component. : By convention, in
http and
https URIs, the last part of a
path is named '''''' and it is optional. It is composed by zero or more path segments that do not refer to an existing physical resource name (e.g. a file, an internal module program or an executable program) but to a logical part (e.g. a command or a qualifier part) that has to be passed separately to the first part of the path that identifies an executable module or program managed by a
web server; this is often used to select dynamic content (a document, etc.) or to tailor it as requested (see also:
CGI and PATH_INFO, etc.). : Example: :: URI: :: where: is the first part of the
path (an executable module or program) and is the second part of the
path named
pathinfo, which is passed to the executable module or program named to select the requested document. : An
http or
https URI containing a
pathinfo part without a
query part may also be referred to as a '
clean URL,' whose last part may be a '
slug.' • An optional '''''' component preceded by a question mark (?), consisting of a
query string of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of
attribute–value pairs separated by a
delimiter. • An optional '''''' component preceded by a
hash (#). The fragment contains a
fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an
HTML document, the fragment is often an
id attribute of a specific element, and web browsers will scroll this element into view. The scheme- or implementation-specific reserved character + may be used in the scheme, userinfo, host, path, query, and fragment, and the scheme- or implementation-specific reserved characters !, $, &, ', (, ), *, ,, ;, and = may be used in the userinfo, host, path, query, and fragment. Additionally, the generic reserved character : may be used in the userinfo, path, query and fragment, the generic reserved characters @ and / may be used in the path, query and fragment, and the generic reserved character ? may be used in the query and fragment.
Example URIs The following figure displays example URIs and their component parts. DOIs (
digital object identifiers) fit within the
Handle System and fit within the URI system,
as facilitated by appropriate syntax.
URI references A
URI reference is either a URI or a
relative reference when it does not begin with a scheme component followed by a colon (:). A path segment that contains a colon character (e.g., foo:bar) cannot be used as the first path segment of a relative reference if its path component does not begin with a slash (/), as it would be mistaken for a scheme component. Such a path segment must be preceded by a dot path segment (e.g., ./foo:bar). Web document
markup languages frequently use URI references to point to other resources, such as external documents or specific portions of the same logical document: • in
HTML, the value of the src attribute of the img element provides a URI reference, as does the value of the href attribute of the a or link element; • in
XML, the
system identifier appearing after the SYSTEM keyword in a
DTD is a fragmentless URI reference; • in
XSLT, the value of the href attribute of the xsl:import element/instruction is a URI reference; likewise the first argument to the document() function. https://example.com/path/resource.txt#fragment //example.com/path/resource.txt /path/resource.txt path/resource.txt ../resource.txt ./resource.txt resource.txt • fragment
Resolution Resolving a URI reference against a
base URI results in a
target URI. This implies that the base URI exists and is an
absolute URI (a URI with no fragment component). The base URI can be obtained, in order of precedence, from: • the reference URI itself if it is a URI; • the content of the representation; • the entity encapsulating the representation; • the URI used for the actual retrieval of the representation; • the context of the application. Within a representation with a well defined base URI of http://a/b/c/d;p?q a relative reference is resolved to its target URI as follows: "g:h" -> "g:h" "g" -> "http://a/b/c/g" "./g" -> "http://a/b/c/g" "g/" -> "http://a/b/c/g/" "/g" -> "http://a/g" "//g" -> "http://g" "?y" -> "http://a/b/c/d;p?y" "g?y" -> "http://a/b/c/g?y" "#s" -> "http://a/b/c/d;p?q#s" "g#s" -> "http://a/b/c/g#s" "g?y#s" -> "http://a/b/c/g?y#s" ";x" -> "http://a/b/c/;x" "g;x" -> "http://a/b/c/g;x" "g;x?y#s" -> "http://a/b/c/g;x?y#s" "" -> "http://a/b/c/d;p?q" "." -> "http://a/b/c/" "./" -> "http://a/b/c/" ".." -> "http://a/b/" "../" -> "http://a/b/" "../g" -> "http://a/b/g" "../.." -> "http://a/" "../../" -> "http://a/" "../../g" -> "http://a/g"
URL munging URL munging is a technique by which a
command is appended to a URL, usually at the end, after a "?"
token. It is commonly used in
WebDAV as a mechanism of adding functionality to
HTTP. In a versioning system, for example, to add a "checkout" command to a URL, it is written as http://editing.com/resource/file.php?command=checkout. It has the advantage of both being easy for
CGI parsers and also acts as an intermediary between HTTP and underlying resource, in this case.
Relation to XML namespaces In
XML, a
namespace is an abstract domain to which a collection of element and attribute names can be assigned. The namespace name is a character string which must adhere to the generic URI syntax. However, the name is generally not considered to be a URI, because the URI specification bases the decision not only on lexical components, but also on their intended use. A namespace name does not necessarily imply any of the semantics of URI schemes; for example, a namespace name beginning with
http: may have no connotation to the use of the
HTTP. Originally, the namespace name could match the syntax of any non-empty URI reference, but the use of relative URI references was deprecated by the W3C. A separate W3C specification for namespaces in XML 1.1 permits
Internationalized Resource Identifier (IRI) references to serve as the basis for namespace names in addition to URI references. ==See also==