To Optimize Robots.txt file is a challenging task. It plays an important role in SEO of your blog. You must have definitely heard about Robots.txt. But are you taking full use of this file? Overlooking this factor can harm your site rankings. If it is wrongly configured, search engines may completely ignore your entire site which can lead to completely disappearing your blog from searches. Today, in this article, I will be explaining how you can edit and optimize Robots.txt file for better SEO. I will be breaking down this article to keep it simple and easy to read.
What Is Robots.txt File?
The robots.txt file is also known as the Robots Exclusions Protocol. It instructs search engine bots on how to crawl a website i.e. what pages to crawl and what pages to ignore. Whenever any search engine bots come to your site, it reads the robots.txt file and follows the instructions. If you have not configured it properly, search engines crawlers and spiders may not index important pages or may index unwanted pages/folders also. So, it is very important to optimize Robots.txt File. This file is placed under root domain directory (E.g. www.yourdomain.com/robots.txt). There can be only one robots.txt on your site and it will be under root directory as mentioned earlier. You can either use cPanel or FTP client to view this file. It is just like any ordinary text file, and you can open it with a plain text editor like Notepad and edit accordingly.
How To create Robots.txt File?
WordPress provides automatic creation of the robots.txt file. So, if you are using WordPress, you should already have a robots.txt file under your root directory. If you are using other CMS or your website doesn’t have any robot.txt file, then just create a simple notepad file and name it as robots.txt and upload it to your site’s root folder using FTP client or cPanel. There are number of robot txt file generators available online.
To check in cPanel, Go to File Manager –> public_html folder. You should have robots.txt file present here.
Understanding Content Of Robots.txt File
Before jumping directly to optimize robots.txt file, let us understand the basics of robots.txt file. There are three commands mainly – User-agent, allow, disallow. We will be configuring these commands to gain better SEO. Let’s see what these commands mean –
- User-agent– User-agent property defines the name of the search engine bots and crawlers for which we are configuring and setting some rules to allow and disallow indexing. It can be Googlebot or Bingbot etc. If you want to mention all search engine bots then instead of mentioning user-agent for individual bots, you can use an asterisk (*) to refer to all search engine bots.
- Allow– Allow property, as the name suggests, instructs search engines to crawl and index certain parts of your site that you wish to.
- Disallow– This property instructs search engines NOT to crawl and index certain parts of your site.
- Sitemap: Sitemap command tells the search engine bots that this is sitemap of the website. Please crawl it too.
Here’s a basic sample of Robots.txt file.
I hope you can understand the above sample now. Here, we are instructing all the search engine bots (since we are using User-agent: * // line 1) not to crawl or index /wp-admin/ part of your website (// line 2) and then allowing to crawl and index other parts of your website(// line 3).
Some more examples for your better understanding:
Allow indexing of everything
Disallow indexing of everything
Disallow any particular bot (say Googlebot) from indexing of a particular folder (myfolder) but allowing a page (mypage) in that folder.
How To Edit & Optimize Robots.txt File?
Now, you are familiar with robots.txt file. Let’s proceed to how you can edit and optimize robots.txt file to gain maximum benefits. Editing robots.txt is one of the things you need to do after Installing WordPress.
Editing the Robots.txt File
You can edit robots.txt file from cPanel or your FTP client. To edit this through cPanel, follow below steps:
Step 1: Login to your cPanel account
Step 2: Go to File Manager
Step 3: Go to public_html folder of your website
Step 4: Locate robots.txt file
Step 5: Right click on file and select ‘Edit’ as shown below:
Step 6: It will prompt below message. Take the backup if you want and click on ‘Edit’.
It will open the file in editable mode. Make necessary changes and click on save changes.
How To Optimize Robots.txt File?
As I mentioned earlier, it is a very challenging task to optimize robots.txt considering all the factors. An un-optimized robots.txt file can harm your SEO and can completely de-index your blog (E.g. if you use the command “Disallow: /” in Robots.txt file, your site will be de-indexed from search engines.). Keep following things in mind when you start to optimize robots.txt file.
- Adding user-agents carefully. Be very cautious, while giving any specific bots (since you may miss important bots) or asterisk (*) (since you may want to ignore some bots too.)
- Determine which parts of your site you don’t want search engine bots to crawl. Certain things can be: /wp-admin/, /cgi-bin/,/index.php, /wp-content/plugins/, /readme.html, /trackback/, /xmlrpc.php etc etc.
- Similarly, you can allow certain important pages of your website. Adding “Allow: /” this command is not that important as bots will crawl your site anyway. But you can use it for the particular bot or if you want to crawl or index.any sub-folder of directory that you have disallowed. In these cases, Allow command is very helpful.
- Adding sitemaps to Robots.txt file is also a good practice.
Following is the robots.txt file of my blog.
This is just a sample robots txt file for you. You can easily see that what folders I have disallowed specifically. You might be wondering about /go/ folder. I am using /go/ to denote my cloaked affiliated links. Since I do not want crawlers and bots to index this, I am disallowing. It is always a good practice to include your website’s sitemap in robots.txt file. It can be placed anywhere in the robots.txt instruction. Mostly it is placed either at the top or bottom of the instruction.
Your robots.txt file can differ from mine since you might have different requirements and other private folders. Few other things that you can do are:
- You can also give certain comments to remember why you have given certain commands and configuration. This commenting can be considered as a definition of the code. This comment is not considered by WebCrawler or bots but if we implement it will be helpful. You can use ‘#’ to give comments.
For example: # Allowing xyz bot to crawl xyz folder.
- You can disallow password protected areas, files, or intranets to enforce security.
- Disallow readme.html to safe your website from external attacks. Readme.html file can be used by someone to know which WordPress version you are using by browsing to it thus they will be able to hack your website.
To do so write: Disallow: /readme.html
You should also disallow WordPress plugin directory for security reasons. Simply write Disallow: /wp-content/plugins/
- Disallow replytocom link to avoid many post duplication issues. Simple writeDisallow: *?replytocom in your site’s robots.txt file
- To block access to all URLs that include a question mark (?), you could use the following entry:
- You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .html, you could use the following entry:
Other tips to optimize robots.txt file
- Don’t use Robots.txt file to hide low-quality contents. The best practice is to use noindex and nofollow meta tag.
- Your robots.txt file shouldn’t cross 200 disallow lines. Start with a few disallows lines. If you wish to add a few more, then add it later.
- Don’t use Robots.txt file to stop search engines to index your Categories, Tags, Archives, Author pages, etc. You can add nofollow and noindex meta tags for this also.
- Stop the search engines from indexing certain directories of your site that might include duplicate content.
Testing Robot.txt File in Google WebMaster Tool (now Google Search Console)
After you have edited and optimize robots.txt file, your first thing should be to test this file if that is properly configured or not. To do so:
Step 1) Login to your Google Search Console account
Step 2) Navigate to ‘Crawl’ section from the left sidebar.
Step 3) Click on the ‘robots.txt Tester’
Step 4) It will show the latest robots.txt file on your website. If you have not changed permanently as of now, you can simply paste the content of optimized robot.txt file and test. Below is the snapshot
Step 5) Select the bot for which you want to test. There are many bots available such as – Googlebot-Video, Googlebot-News, Googlebot-Image etc.
Step 6) Click on ‘Test’ button.
If everything is good and bots are allowed to crawl your website, it will prompt ‘ALLOWED’ with green color showing your settings are fine.
Step 7) You can submit the robots.txt file now by clicking on ‘Submit’ button.
Congratulation! You have successfully optimized robots.txt file now.
Also Read: How To Optimize URL Structure Of Your Blog?
Hope this guide has helped you in understanding various aspects of Robots.txt. Don’t Forget To Share It with your friends and subscribe to our Email newsletter for more such updates. If you have any questions on how to optimize robots.txt file, please feel free to ask in the comments section below.
Latest posts by Mohit Arora (see all)
- How to Evaluate Online Advertising Agencies for Web Marketing? - March 16, 2017
- How To Edit & Optimize Robots.txt File For Better SEO? - March 4, 2017
- What You May Not Know About MD5 Hash Generator 2? - February 19, 2017