Knowledge Essentials - 3Essentials Hosting

Preventing Search Engine From Crawling A URL - robots.txt	Article ID: 161
Back to Search
Search engine robots will check a special file in the root of each server called `robots.txt`, which is, as you may guess, a plain text file (not HTML). Robots.txt implements the Robots Exclusion Protocol, which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can disallow access to cgi, private and temporary directories, for example, because they do not want pages in those areas indexed. The syntax of this file is obscure to most of us: it tells robots not to look at pages which have certain paths in their URLs. Each section includes the name of the user agent (robot) and the paths it may not follow. There is no way to allow a specific directory, or to specify a kind of file. You should remember that robots may access any directory path in a URL which is not explicitly disallowed in this file: everything not forbidden is OK More... http://www.robotstxt.org

Downloads Associated With This Article
No downloads are currently associated with this article.

Preventing Search Engine From Crawling A URL - robots.txt