A file called ROBOTs.TXt is by default present in your Magento based E-commerce. The importance of having this file in your site is to prevent search engines for indexing. This may happen, particularly, at the development stage when you do not have prices, products and services. As a result your site do not appear in top search engine results. The process of setting and fixing it is very simple through the Magento Admin Panel options.
Advantages of using ROBOTS.TXT in Mangento: Question might crop up of the significance of using ROBOTS.TXT. Search engines tend to send a tiny spiders to the site in order to get information about the site. The intention of performing is to get your pages indexed in the search results. The best part of the search is to make the search engine know where the indexing is discouraged. These apart, robots can selectively execute function automatically for HTML and link validation. Another important aspect for the robots are to hide the site’s Javascript, SID parameters and prevent content duplication. Besides, it helps to improve Magento SEO and significantly diminishes amount of server resources. It also reduces the mark marked by other web-robots,who, use the bandwidth allocation through specifying a Crawl-delay. Hence the important of using this file. But it is at the same time to use it correctly.
Things you should know before: Certain pertinent aspects you ought to know before installing Robots.txt file. It sets one domain at a time. In regard of sub domain i.e. Managento.example.com), you need to install a separate Robots.txt file. Same principle holds true if you run multiple online stores. To put the Robots.txt in action, you need to have a text editor like Notepad, Vim, Dream-weaver etc. Different robots are available i.e. Googlebot and Bingbot. Those can be used as crawlers.
After installing Robots.txt , it is supposed to stay under root directory. It may happen that your store domain is , for example, www.e-store.com , you have to place Robots.txt file under the domain root where the app directory exists. The mode of accessing will be thus www.e-store.com/robots.txt. It is of no use if saved in other directories or subdirectories.
Two more consideration is worthwhile for using robots.txt in Magento site. They are as follows :
- Since the file is publicly available, anyone can view the unwanted section of your server.
- Robots may be this file shy, specially malware. This malware can be able to scan the web for detecting security loopholes.
Installation process and tips: There are several ways through which Magento Robots.txt can be installed. Manual installation can be done with the help of file which is available in the web. Copy the contents and put a newly created Robots.txt file.
Sitemap.xml location has to be changed before uploading the file into the site’s root (even if the Magento installation is in the subdirectory).
This version of robots.txt is offered by byte.nl as an optimal one.
# robots.txt # This file is to prevent the crawling and indexing of certain parts # of your site by web crawlers and spiders run by sites like Yahoo! # and Google. By telling these “robots” where not to go on your site, # you save bandwidth and server resources. # # This file will be ignored unless it is at the root of your host: # Used: http://example.com/robots.txt # Ignored: http://example.com/site/robots.txt # # For more informationsk abocut the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html # # Prevent blocking URL parameters with robots.txt # Use Google Webmaster Tools > Crawl > Url parameters instead # Website Sitemap Sitemap: http://www.example.com/sitemap.xml # Crawlers Setup User-agent: * Crawl-delay: 10 # Allowable Index # Mind that Allow is not an official standard Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ # Allow: /catalogsearch/result/ Allow: /media/catalog/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ # Disallow: /media/ Disallow: /media/captcha/ # Disallow: /media/catalog/ #Disallow: /media/css/ #Disallow: /media/css_secure/ Disallow: /media/customer/ Disallow: /media/dhl/ Disallow: /media/downloadable/ Disallow: /media/import/ #Disallow: /media/js/ Disallow: /media/pdf/ Disallow: /media/sales/ Disallow: /media/tmp/ Disallow: /media/wysiwyg/ Disallow: /media/xmlconnect/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /scripts/ Disallow: /shell/ #Disallow: /skin/ Disallow: /stats/ Disallow: /var/ # Paths (clean URLs) Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalog/product/gallery/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/ # Files Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt Disallow: /get.php # Magento 1.5+ # Paths (no clean URLs) #Disallow: /*.js$ #Disallow: /*.css$ Disallow: /*.php$ Disallow: /*?SID= Disallow: /rss* Disallow: /*PHPSESSID
Another way to install Robots.txt for Magento, is to follow these simple guide lines:
- 1) Download the robots.text file first (there are a lot of sources available).
- 2) Whenever your Magento is installed within a subdirectory, you will have to modify the robots.txt file correspondingly. It means, for instance, changing ‘Disallow: /404/’ to ‘Disallow: /your-sub-directory/404/’ and ‘Disallow: /app/’ to ‘Disallow: /your-sub-directory/app/’.
- 3) Check if the domain you use has a sitemap.xml and add URL to your sitemap.xml afterwards.
- 4) It’s time to upload the robots.txt file to your root folder. Just place the file within ‘httpdocs/’ directory. It can be done in two ways: by logging in your Control Panel with your credentials and via FTP client of your preference.
For Magento backend: This steps involve using extension for robots.txt file. There is some good news is that instead of doing whole thing manually in hand, it is better to download some specific tools to generate Robots.txt for Magento. You can customize some options through setting and moreover you can implement some of your own rules.
Reindexing Robots.txt: It may observe that search engine of and on read the changed Magento Robots.txt file for too long. The GWT can point out that when your site was last indexed.
If you want Google or other search engines to get the up-dated version sooner than in 24 hours or a hundred of visits, you can use Header Cache-Control in your .htaccess file. Apply this statement to your .htaccess file:
In short it may be said that most of the Magento Agency have almost similar views when it comes to robots.txt. Before jumping your gun, it is a prudent step to consult as to the right code to copy/paste into your site so that it may not damage your online Magento or Magento 2 store. Therefore, the suggestive suggestion would be to always test your Robots.txt file with the help of Yandex Webmaster or Frobee.