Security Verified
Browse Categories
 Affiliates
 Backup & Restore
 Billing & MyAccount FAQs
 Control Panels
 Customer Tools
 Databases
 Dedicated Servers
 DNS and Domain Registration
 Email
 Glossary
 Network
 News
 PCI Compliance
 Pre-sales FAQ
 Publishing Your Site
 Reseller Tips
 Search Engine Optimization
 Security Topics
 SSL and HTTPS
 Support: Getting Help
 Virtual Desktops
 Virtual Servers
 Website Design & Development
 Website Errors

DotNetNuke Hosting
SQL 2008 R2 / IIS 7.5 Hosting
WordPress Hosting
Forex Trader Hosting
Personal Desktop Hosting

Preventing Search Engine From Crawling A URL - robots.txt

Article ID: 161

 Back to Search

Search engine robots will check a special file in the root of each server called robots.txt, which is, as you may guess, a plain text file (not HTML). Robots.txt implements the Robots Exclusion Protocol, which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can disallow access to cgi, private and temporary directories, for example, because they do not want pages in those areas indexed.

The syntax of this file is obscure to most of us: it tells robots not to look at pages which have certain paths in their URLs. Each section includes the name of the user agent (robot) and the paths it may not follow. There is no way to allow a specific directory, or to specify a kind of file. You should remember that robots may access any directory path in a URL which is not explicitly disallowed in this file: everything not forbidden is OK

More...
http://www.robotstxt.org

 
Downloads Associated With This Article
No downloads are currently associated with this article.

© 2001 - 2012 3Essentials Inc.
Terms Of Service | Privacy Policy | Copyright Policy | Affiliates