Header Information

Document Type Declaration

A valid HTML document declares what version of HTML is used in the document. The document type declaration names the document type definition (DTD) in use for the document.

Head

The HEAD element contains information about the current document, such as its title, keywords that may be useful to search engines, and other data that is not considered document content. They may, however, make information in the HEAD available to users through other mechanisms.

Title

The page title shall include useful and distinctive indication of the contents. The value of this title for search engine query-response selection shall be considered.

Description Tag

The DESCRIPTION meta tag may be used to provide guidance to search engines on what to present users in the search response (e.g. <meta name="description" content="response">).

Keywords

Search engines should be expected only to consider some limited number of keywords when indexing pages. Web pages shall present keywords in priority order and without duplication (e.g., ).

Dublin Core

The Dublin Core DTD was developed by the library sciences community, but may be applicable to general purpose web page indexing. The Dublin Core Metadata shall be used for fields of information that are of value in indexing or cataloguing the web page.

Content Selection

Web page design shall include consideration of content-selection mechanisms. Within the context of intranets/ extranets, Platform for Internet Content Selection (PICS) rating services and mechanisms may be useful to ensure that users are accessing the preferred information sources. For example, an index search within an organization for information about a corporate policy may yield pages with opinions, local implementations, or other variations. A rating system within an organization may distinguish between "corporate" policy data, legal requirements, and other guidelines. The PICS mechanism could then be used to provide users with a view of the data that was relevant to their environment, rather than forcing them to locate the relevant views from a much wider set of responses. The use of metadata and content included for the purpose of content selection (indexing) shall not be misleading.

Robot Exclusion

Web pages shall incorporate robot exclusion elements (see Annex E) as the method for indicating pages to be indexed or searched by automated means and those to be excluded.

Bandwidth Efficiencies

The first bytes (including <head> bytes) have the most impact on network overhead. Transport Control Protocol (TCP) operates with a "slow start," awaiting an acknowledgment of initial packets sent before initiating a full sequence of transmissions. This avoids congestion of the net that may be directed to a non-responsive site. This makes the data transferred first from the server, and initial elements of the page (e.g., <head>, etc.) more critical in response time and network loading. Data in the sequence should be focused to minimize overhead, and provide essential data to the client. Unfortunately, the HTML format calls for all metadata to be in the head section. (See the performance articles listed in Annex B for more details on bandwidth impact.) There are modest gains to be achieved in using lowercase in tags and <head> entries (due to improved compression efficiency).

Human Language Specification

To facilitate accurate indexing, and ease of access for users, web pages shall include the LANG metatag declaring the primary language environment(s) for each page.

Digital Signature

Digital signature and other fingerprinting mechanisms may be applied to ensure page integrity and authentication. Information related to this may be communicated through header extensions or related files, or it may be implicit in the content body. Resigning pages may be problematic, so extra care should be given to ensure the immutability of the data (including links, etc.) within the signed area.

Version 2.0
Page last modified: 15 May 2002
National Institute of Standards and Technology (NIST)