HTML markup consists of several key components, including those called
tags (and their
attributes), character-based
data types,
character references and
entity references. HTML tags most commonly come in pairs like and , although some represent
empty elements and so are unpaired, for example . The first tag in such a pair is the
start tag, and the second is the
end tag (they are also called
opening tags and
closing tags). Another important component is the HTML
document type declaration, which triggers
standards mode rendering. The following is an example of the classic
"Hello, World!" program: This is a title Hello world! The text between and describes the web page, and the text between and is the visible page content. The markup text defines the browser page title shown on
browser tabs and
window titles and the tag defines a division of the page used for easy styling. Between and , a element can be used to define webpage metadata. The Document Type Declaration is for HTML5. If a declaration is not included, various browsers will revert to "
quirks mode" for rendering.
Elements HTML documents imply a structure of nested
HTML elements. These are indicated in the document by HTML
tags, enclosed in angle brackets. In the simple, general case, the extent of an element is indicated by a pair of tags: a "start tag" and "end tag" . The text content of the element, if any, is placed between these tags. Tags may also enclose further tag markup between the start and end, including a mixture of tags and text. This indicates further (nested) elements, as children of the parent element. The start tag may also include the element's
attributes within the tag. These indicate other information, such as identifiers for sections within the document, identifiers used to bind style information to the presentation of the document, and for some tags such as the used to embed images, the reference to the image resource in the format like this: Some elements, such as the
line break do not permit
any embedded content, either text or further tags. These require only a single empty tag (akin to a start tag) and do not use an end tag. Many tags, particularly the closing end tag for the very commonly used paragraph element , are optional. An HTML browser or other agent can infer the closure for the end of an element from the context and the structural rules defined by the HTML standard. These rules are complex and not widely understood by most HTML authors. The general form of an HTML element is therefore: . Some HTML elements are defined as
empty elements and take the form . Empty elements may enclose no content, for instance, the tag or the inline tag. The name of an HTML element is the name used in the tags. The end tag's name is preceded by a slash character /. If a tag has no content, an end tag is not allowed. If attributes are not mentioned, default values are used in each case.
Element examples Header of the HTML document: . The title is included in the head, for example: The Title
Headings HTML headings are defined with the to tags with H1 being the highest (or most important) level and H6 the least: Heading level 1 Heading level 2 Heading level 3 Heading level 4 Heading level 5 Heading level 6 The effects are: Heading Level 1 Heading Level 2 Heading Level 3 Heading Level 4 Heading Level 5 Heading Level 6 CSS can substantially change the rendering. Paragraphs:Paragraph 1 Paragraph 2
Line breaks . The difference between and is that
breaks a line without altering the semantic structure of the page, whereas sections the page into
paragraphs. The element is an
empty element in that, although it may have attributes, it can take no content and it must not have an end tag. This is a paragraph with line breaks
Links This is a link in HTML. To create a link the tag is used. The href attribute holds the
URL address of the link. A link to Wikipedia!
Inputs There are many possible ways a user can give inputs like:
Comments: Comments can help in the understanding of the markup and do not display in the webpage. There are several types of markup elements used in HTML: • Structural markup indicates the purpose of text: • For example, establishes "Golf" as a second-level
heading. Structural markup does not denote any specific rendering, but most web browsers have default styles for element formatting. Content may be further styled using
Cascading Style Sheets (CSS). • Presentational markup indicates the appearance of the text, regardless of its purpose: • For example, indicates that visual output devices should render "boldface" in bold text, but gives little indication what devices that are unable to do this (such as aural devices that read the text aloud) should do. In the case of both and , there are other elements that may have equivalent visual renderings but that are more semantic in nature, such as and respectively. It is easier to see how an aural user agent should interpret the latter two elements. However, they are not equivalent to their presentational counterparts: it would be undesirable for a screen reader to emphasize the name of a book, for instance, but on a screen, such a name would be italicized. Most presentational markup elements have become
deprecated under the HTML 4.0 specification in favor of using
CSS for styling. • Hypertext markup makes parts of a document into links to other documents: • An anchor element creates a
hyperlink in the document and its href attribute sets the link's target
URL. For example, the HTML markup , will render the word "
Wikipedia" as a hyperlink. To render an image as a hyperlink, an img element is inserted as content into the a element. Like br, img is an empty element with attributes but no content or closing tag. .
Attributes Most of the attributes of an element are
name–value pairs, separated by = and written within the start tag of an element after the element's name. The value may be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML (but not XHTML). Leaving attribute values unquoted is considered unsafe. In contrast with name-value pair attributes, there are some attributes that affect the element simply by their presence in the start tag of the element, like the ismap attribute for the img element. There are several common attributes that may appear in many elements: • The id attribute provides a document-wide unique identifier for an element. This is used to identify the element so that stylesheets can alter its presentational properties, and scripts may alter, animate or delete its contents or presentation. Appended to the URL of the page, it provides a globally unique identifier for the element, typically a sub-section of the page. For example, the ID "Attributes" in https://en.wikipedia.org/wiki/HTML#Attributes. • The class attribute provides a way of classifying similar elements. This can be used for
semantic or presentation purposes. For example, an HTML document might semantically use the designation to indicate that all elements with this class value are subordinate to the main text of the document. In presentation, such elements might be gathered together and presented as footnotes on a page instead of appearing in the place where they occur in the HTML source. Class attributes are used semantically in
microformats. Multiple class values may be specified; for example puts the element into both the notation and the important classes. • An author may use the style attribute to assign presentational properties to a particular element. It is considered better practice to use an element's id or class attributes to select the element from within a
stylesheet, though sometimes this can be too cumbersome for a simple, specific, or ad hoc styling. • The title attribute is used to attach a subtextual explanation to an element. In most
browsers this attribute is displayed as a
tooltip. • The lang attribute identifies the natural language of the element's contents, which may be different from that of the rest of the document. For example, in an English-language document: Oh well, c'est la vie, as they say in France. The abbreviation element, abbr, can be used to demonstrate some of these attributes: HTML This example displays as HTML; in most browsers, pointing the cursor at the abbreviation should display the title text "Hypertext Markup Language." Most elements take the language-related attribute dir to specify text direction, such as with "rtl" for right-to-left text in, for example,
Arabic,
Persian or
Hebrew.
Character and entity references As of version 4.0, HTML defines a set of 252
character entity references and a set of 1,114,050
numeric character references, both of which allow individual characters to be written via simple markup, rather than literally. A literal character and its markup counterpart are considered equivalent and are rendered identically. The ability to "
escape" characters in this way allows for the characters < and & (when written as < and &, respectively) to be interpreted as character data, rather than markup. For example, a literal < normally indicates the start of a tag, and & normally indicates the start of a character entity reference or numeric character reference; writing it as & or & or & allows & to be included in the content of an element or in the value of an attribute. The double-quote character ("), when not used to quote an attribute value, must also be escaped as " or " or " when it appears within the attribute value itself. Equivalently, the single-quote character ('), when not used to quote an attribute value, must also be escaped as ' or ' (or as ' in HTML5 or XHTML documents) when it appears within the attribute value itself. If document authors overlook the need to escape such characters, some browsers can be very forgiving and try to use context to guess their intent. The result is still invalid markup, which makes the document less accessible to other browsers and to other
user agents that may try to parse the document for
search and indexing purposes for example. Escaping also allows for characters that are not easily typed, or that are not available in the document's
character encoding, to be represented within the element and attribute content. For example, the acute-accented e (é), a character typically found only on Western European and South American keyboards, can be written in any HTML document as the entity reference é or as the numeric references é or é, using characters that are available on all keyboards and are supported in all character encodings.
Unicode character encodings such as
UTF-8 are compatible with all modern browsers and allow direct access to almost all the characters of the world's writing systems.
Data types HTML defines several
data types for element content, such as script data and stylesheet data, and a plethora of types for attribute values, including IDs, names,
URIs, numbers, units of length, languages, media descriptors, colors, character encodings, dates and times, and so on. All of these data types are specializations of character data.
Document type declaration HTML documents are required to start with a
document type declaration (informally, a "doctype"). In browsers, the doctype helps to define the rendering mode—particularly whether to use
quirks mode. The original purpose of the doctype was to enable the parsing and validation of HTML documents by SGML tools based on the
document type definition (DTD). The DTD to which the DOCTYPE refers contains a machine-readable grammar specifying the permitted and prohibited content for a document conforming to such a DTD. Browsers, on the other hand, do not implement HTML as an application of SGML and as consequence do not read the DTD.
HTML5 does not define a DTD; therefore, in HTML5 the doctype declaration is simpler and shorter: An example of an HTML 4 doctype This declaration references the DTD for the "strict" version of HTML 4.01. SGML-based validators read the DTD in order to properly parse the document and to perform validation. In modern browsers, a valid doctype activates standards mode as opposed to
quirks mode. In addition, HTML 4.01 provides Transitional and Frameset DTDs,
as explained below. The transitional type is the most inclusive, incorporating current tags as well as older or "deprecated" tags, with the Strict DTD excluding deprecated tags. The frameset has all tags necessary to make frames on a page along with the tags included in transitional type. == Semantic HTML ==