Robots.txt for Search Engines
Search engines, including the one used in our site search, periodically send indexing "spiders" (robot programs) to crawl your web site and collect information so they can list your pages. Before it does any indexing, the spider checks to see if it's allowed to visit by looking for a robots.txt file in your root directory. You can use your robots.txt file to disallow searching altogether or list private areas of your web site that you don't want a search engine to index (for example: members only areas, your www/data folder, your www/usage folder or customer information). If the spider does not find a robots.txt file or finds nothing on the file that disallows the search engine spiders to index your site, it will request your URL from your server.

There are two types of Robots:
  1. META Tag Robots (for individual web pages) or
  2. Server Wide Robots (for your entire site)
The meta type robot tag must appear on any page for which you want to communicate information to the search engine spiders.

<meta name="robots" content="INDEX,FOLLOW">
Site wide Robots are controlled by a plain text file located in your root directory (http://www.yourdomain.com/robots.txt). Here is an example:
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /data/
    Disallow: /tmp/
    Disallow: /usage/
    Disallow: /~joe/
This example allows every page (*), except (Disallow:) anything in the cgi-bin, the data, tmp or usage directory or the ~joe directory.
  1. 302 Error: Server auto redirecting URL location of /robots.txt
  2. 403 error forbidding access
  3. robots.txt file content disallows indexing section of site where submitted URL is located
  4. robots.txt syntax errors:
    • Variable * used in the Disallow line
    • User-agent with wrong case letter A: "User-Agent" is wrong: "User-agent" is correct
  5. Submitted URL has meta robots noindex tag in place
Refer to the resources below for directions on creating, modifying, placing, and checking robots.txt files.

