Search Engine Marketing Guide - Web Site Optimisation

Search Engine Friendly Design - Robots.txt

In This Section

  What is a robots.txt file for?
  Should all sites have a robots.txt file?
  What goes on a robots.txt file?

All major search engines support the robots.txt and it is the first file a spider will request when visiting a site. Therefore a robots.txt file should be part of every site. It should be used to exclude the robots from sensitive material that is not password protected, any under construction pages, test sections, js files, style sheets and the CGI-BIN.

Robots.txt files are not only for useful for stopping the engines from grabbing sensitive pages they also help to tell the engines where they should go instead. If you have a very large site the engines may spend so much time spidering your image directory, test directory or other non-essentials that they run out of time and go off to another site. This is not what you want. You want the engines to get to the important, keyword rich pages on your site. So use a robots.txt to keep them away from anything that isn't going to be beneficial to your search engine rankings.

As the robots.txt is explicit you should never “Allow” anything on the file unless that is the ONLY directory or page that you want the engines to index. As that would be a very rare occurrence it is best to primarily use the file to disallow what you don't want indexed instead. Run the robots.txt through a validator to ensure it's written correctly or it may end up doing more harm than good.

An example of a robots.txt
RobotsTxt.Org
Robots.tx Validator

NEXT Step: Web Site Optimisation