A valid HTML document declares what version of HTML is used in the document.
The document type declaration names the document type definition (DTD)
in use for the document.
The HEAD element contains information about the current document, such as its
title, keywords that may be useful to search engines, and other data that is
not considered document content. They may, however, make information in the
HEAD available to users through other mechanisms.
The page title shall include useful and distinctive indication of the contents.
The value of this title for search engine query-response selection shall be
considered.
The DESCRIPTION meta tag may be used to provide guidance to search engines on
what to present users in the search response (e.g. <meta name="description"
content="response">).
Search engines should be expected only to consider some limited number of
keywords when indexing pages. Web pages shall present keywords in priority
order and without duplication (e.g., ).
The Dublin Core DTD was developed by the library sciences community, but may
be applicable to general purpose web page indexing. The Dublin Core Metadata
shall be used for fields of information that are of value in indexing or
cataloguing the web page.
Web page design shall include consideration of content-selection mechanisms.
Within the context of intranets/ extranets, Platform for Internet Content
Selection (PICS) rating services and mechanisms may be useful to ensure that
users are accessing the preferred information sources. For example, an index
search within an organization for information about a corporate policy may
yield pages with opinions, local implementations, or other variations. A
rating system within an organization may distinguish between "corporate"
policy data, legal requirements, and other guidelines. The PICS mechanism
could then be used to provide users with a view of the data that was relevant
to their environment, rather than forcing them to locate the relevant views
from a much wider set of responses. The use of metadata and content included
for the purpose of content selection (indexing) shall not be misleading.
Web pages shall incorporate robot exclusion elements (see Annex E) as the
method for indicating pages to be indexed or searched by automated means and
those to be excluded.
The first bytes (including <head> bytes) have the most impact on network
overhead. Transport Control Protocol (TCP) operates with a "slow start,"
awaiting an acknowledgment of initial packets sent before initiating a full
sequence of transmissions. This avoids congestion of the net that may be
directed to a non-responsive site. This makes the data transferred first from
the server, and initial elements of the page (e.g., <head>, etc.) more
critical in response time and network loading. Data in the sequence
should be focused to minimize overhead, and provide essential data to the
client. Unfortunately, the HTML format calls for all metadata to be in the
head section. (See the performance articles listed in Annex B for more details
on bandwidth impact.) There are modest gains to be achieved in using lowercase
in tags and <head> entries (due to improved compression efficiency).
To facilitate accurate indexing, and ease of access for users, web pages shall include the LANG metatag declaring the primary language environment(s) for each page.
Digital signature and other fingerprinting mechanisms may be applied to ensure
page integrity and authentication. Information related to this may be
communicated through header extensions or related files, or it may be implicit
in the content body. Resigning pages may be problematic, so extra care should
be given to ensure the immutability of the data (including links, etc.) within
the signed area.