Possibly Historical Document - Last Updated Tue Oct 18 14:02:45 2005

META builder 2

The form will generate HTML META tags suitable for inclusion in your HTML document. These tags allow better indexing by robot-driven search engines, such as Google or AltaVista

Some HTML editors will generate some of these tags automatically.

You can also use the META generator at SafeSurf to generate a PICS header - a multi-valued, multidimensional ratings scheme for classifying Web pages for use with kid's Web software or PICS-compliant browsers such as MSIE.

Title (Strongly Recommended)

Your documents title will appear in user's hotlists, the banner of most browsers, and robot-generated lists. It should be a concise, one-line summary of what the page is about. Bear in mind that users may not reach your document through your homepage, but directly using a search engine or link at another site, so the title should ideally be self-sufficient.

Keywords

Comma-separated list of key words for indexing your document.
Some robots look at keywords in context, so it is best to preserve word order and case, e.g. pizza, Vancouver, British Columbia rather than british vancouver columbia pizza
Not so important now in e.g. Google, but keywords are still used in e.g. Office and PDF documents.

Description

The description is sometimes presented to the user along with the document's title as the result of a search.
Many robots use the first few lines of text as a description if the Description tag is not present. For documents using frames, it is possible that there is no such text present. For an academic text, this should probably be the abstract.

Author

The authors name. Used to build lists of broken links.

Expiry Date (Optional)

The date after which a page is considered stale, in RFC1123 date format. is used by browsers and proxies to delete documents from the cache. If you know your page will go stale, this is probably a good idea. Netscape Navigator honours the META tag; other agents and proxies may require the HTTP header. Netscape 3 will cache a document with an "Expires: 0" tag, but will issue a GET with If-Modified-Since (regardless of option settings), and thus retrieve an updated copy if one exists. The searchBC search engine uses the Expires value as a hint to schedule a revisit.

Language (Optional)

Dialect (Optional)

Some browsers (Arena, Mosaic-L10N, Netscape) have the ability to perform content negotiation. What this means is that the user configures the browser to prefer certain languages based on the users fluency, by specifying an HTTP_ACCEPT_LANGUAGE header. For example, the list en-CA, en-GB, fr would say that you would accept (in order of preference) Canadian English, UK English, and French. Some servers, e.g. Apache, can use this information to serve a document in the preferred language. To function properly, the language/dialect combination must be available to the server (see the server documentation). The searchBC robot indexes a META tag for reference purposes.

For a demonstration of language negotiation and charsets, see the Multilingual page.

You may use the browser test script to discover if your browser is sending an HTTP_ACCEPT_LANGUAGE header.

Charset (Optional)

Charsets may be specified by the server; for instance:
Content-type: text/html; charset=iso-8859-5
Netscape 2.0 works properly with this method; Some very old browsers such as Mosaic break. Netscape 3 will use a META tag to automatically switch fonts (X11 Netscape, at least), and provided the server does not parse HTTP-EQUIV META tags into real HTTP headers, other browsers will ignore it. Thus this method is recommended for non-ISO-8859-1 (Western European) character sets, as it will cause Netscape to select the correct font for each page.

SearchBC will index this META tag. The default HTML charset is ISO-8859-1 (Western European 8-bit).

See How to make a Multilingual Webserver for more information about using Charset and Language tags.

Robots (Recommended)

See the workshop report at W3 for the full text.
        <META NAME="ROBOTS"
              CONTENT="ALL | NONE | NOINDEX | NOFOLLOW">

        default = empty = "ALL"
        "NONE" = "NOINDEX, NOFOLLOW"
The filler is a comma separated list of terms:
ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW.

Discussion: This tag is meant to provide users who cannot control the robots.txt file at their sites. It provides a last chance to keep their content out of search services. It was decided not to add syntax to allow robot specific permissions within the meta-tag.

INDEX means that robots are welcome to include this page in search services.

FOLLOW means that robots are welcome to follow links from this page to find other pages.

So a value of "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed. A value of "NOFOLLOW" allows the page to be indexed, but no links from the page are explored (this may be useful if the page is a free entry point into pay-per-view content, for example. A value of "NONE" tells the robot to ignore the page.

The META generator will build the HTML according to the buttons selected.

Googlebot

Google define their own metadata element GOOGLEBOT, using INDEX, FOLLOW and two additional terms ARCHIVE and SNIPPET

NOARCHIVE means that documents will not be saved in Google (or Google Search Appliance) cache.

NOSNIPPET means that no excerpt text will be displayed in the search results.

Use of these terms may be useful where Google or a GSA has access to a Web page which is otherwise restricted or password-protected.

Note that other metadata such as Pragma and Cache-Control are needed to control page caching in a user's browser,

See Google Information for Webmasters