Last Updated on
In this article, I will talk about Robots.txt file, how to edit Robots.txt file and how you can easily Optimize WordPress Robots.txt file.
I will also explain how you can create Robots.txt file in case your website’s robots.txt is not available for some reason.
When you are optimizing your website for SEO, you will definitely want to cover most of the things that make your website’s SEO better.
The Robots.txt file is one of those aspects which you should consider to optimize for your site’s SEO.
Whether it is WordPress or any other platform or a static website, an optimized Robots.txt file is an import factor for better SEO of the website.
When I say Optimized Robots.txt that means it should allow Search Engine bots to crawl and index important content of your site and at the same time, it must disallow (block) Search Engine bots to access other parts of your site that are not intended to be indexed.
Robots.txt file is a simple text file in the root directory of your WordPress setup.
This file has a user-specified set of instructions for Search Engine Bots.
These instructions help Search Engines (Google, Bing, etc.) to understand that where they are allowed to go while visiting your site.
Robots.txt also tells Search Engines the places that are disallow-ed for their visit.
I will also talk about some certain things that you do not want Search Engines to crawl and index.
Or in other words, you should not allow Search Engine bots to access vulnerable components of your website.
Do I need a Robots.txt File?
If you don’t have a robots.txt file, then search engines will still crawl and index your website.
However, you will not be able to tell search engines which pages or folders they should not crawl.
This will not have much of an impact when you’re first starting a blog and do not have a lot of content.
However as your website grows and you have a lot of content, then you would likely want to have better control over how your website is crawled and indexed.
Here is why.
Search bots have a crawl quota for each website.
This means that they crawl a certain number of pages during a crawl session.
If they don’t finish crawling all pages on your site, then they will come back and resume crawl in the next session.
This can slow down your website indexing rate.
You can fix this by disallowing search bots from attempting to crawl unnecessary pages like your WordPress admin pages, plugin files, and themes folder.
By disallowing unnecessary pages, you save your crawl quota. This helps search engines crawl even more pages on your site and index them as quickly as possible.
Another good reason to use robots.txt file is when you want to stop search engines from indexing a post or page on your website.
It is not the safest way to hide content from the general public, but it will help you prevent them from appearing in search results.
Why Should You Care About Your Robots.txt File?
For most webmasters, the benefits of a well-structured robots.txt file boil down to two categories:
- Optimizing search engines’ crawl resources by telling them not to waste time on pages you don’t want to be indexed. This helps ensure that search engines focus on crawling the pages that you care about the most.
- Optimizing your server usage by blocking bots that are wasting resources.
Robots.txt Isn’t Specifically About Controlling Which Pages Get Indexed In Search Engines
Robots.txt is not a foolproof way to control what pages search engines to index.
If your primary goal is to stop certain pages from being included in search engine results, the proper approach is to use a meta no-index tag or another similarly direct method.
This is because your Robots.txt is not directly telling search engines not to index content – it’s just telling them not to crawl it.
While Google won’t crawl the marked areas from inside your site, Google itself states that if an external site links to a page that you exclude with your Robots.txt file, Google still might index that page.
John Mueller, a Google Webmaster Analyst, has also confirmed that if a page has links pointing to it, even if it’s blocked by robots.txt, it might still get indexed.
Below is what he had to say in a Webmaster Central hangout:
One thing maybe to keep in mind here is that if these pages are blocked by robots.txt, then it could theoretically happen that someone randomly links to one of these pages.
And if they do that then it could happen that we index this URL without any content because its blocked by robots.txt.
So we wouldn’t know that you don’t want to have these pages actually indexed.
Whereas if they’re not blocked by robots.txt you can put a noindex meta tag on those pages.
And if anyone happens to link to them, and we happen to crawl that link and think maybe there’s something useful here then we would know that these pages don’t need to be indexed and we can just skip them from indexing completely.
So, in that regard, if you have anything on these pages that you don’t want to have indexed then don’t disallow them, use noindex instead.
How to Create Robots.txt file?
When you set up WordPress using the most popular 1-click install method, Robots.txt file got created in the root directory of WordPress.
Also for static websites, robots.txt is stored in the root folder of the website.
You can check whether your site has rotots.txt file by visiting the following URL:
Replace yourdomain.com with your domain name and open the above URL in the browser.
You should be prompted to the content of the robots.txt file.
For example: at https://nintendohill.net/robots.txt you will find Robots.txt file of NintendoHill.
If robots.txt is not available on your site then you will get a 404 error or page not found an error.
For some reason, if your WordPress site does not have robots.txt file. There is no need to worry about that, as you can easily create one.
Follow the steps below to create a robots.txt file in WordPress:
- Create a text file in any text editor (like notepad) and paste the following basic instructions (syntax) in it and save the file as robots.txt on your computer.
- Go to the root directory 1 of your WordPress website using FTP client or via Hosting dashboard>File manager. (1 Here root directory is the folder on your web hosting where all the WordPress core files are stored.)
- Upload robots.txt file from your computer to root directory using FTP client or using File manager. You are done. You have successfully created a Robots.txt file.
How to Edit Robots.txt file in WordPress?
Before editing the robots.txt file, I would recommend you to take a backup of your file to avoid any hiccups.
You can edit the WordPress robots.txt file by the following methods:
- Edit robots.txt locally on the computer with a text editor like notepad and upload it to your site’s root directory.
- Use a specific plugin to edit the “Robots.txt” file.
- Use Yoast SEO plugin.
- Directly edit it on Webhosting using cPanel File Manager.
- You can edit robots.txt file on your computer just like you edit any other text file. Open the file, make changes and save it. That’s it.
- However, It is advisable to avoid the use of a plugin for small things, if you can. If you are the plugin person, you can install this plugin to edit robots.txt.
- Yes, you can use the Yoast SEO plugin to edit robots.txt. In Yoast SEO settings go to Tools > File Editor > robots.txt file. From there you can edit it.
- Go to the root directory of your WordPress using Hosting Dashboard (cPanel) >File manager. Right-click on the robots.txt file then click edit, you will be prompted to file edit screen. From there you can make changes in robots.txt and save when you are done.
Now your site has robots.txt file and you know how to edit it. But you are still not clear what to write in it.
The next section is all about the content of the robots.txt file.
How to Optimize WordPress Robots.txt file
In this section, I will tell you how to use Robots.txt file in WordPress for Optimum SEO.
In the syntax of robots.txt file, there are few Crawler Directives that specify the action of a particular instruction. These crawler directives tell the Search Engine Bots that what they should do with different parts of your site.
List of some Crawler Directives used in Robots.txt:
User-agent: is used for bots (Search engine bots and ad bots etc).
Allow: is used to allow to crawl and index.
Disallow: is used to disallow (block) to index.
Sitemap: is used to add the sitemap URLs of your site.
There are few things which you should keep in your mind while editing Robots.txt file:
- You should specify the sitemap URLs of your site.
- Be careful while choosing a folder to Disallow. As it may affect your Search Engine appearance.
- You should Disallow cloaking link folder like /out or /recommends in your root directory, if you have one.
- You should Disallow HTML file in the root directory of your WordPress setup.
- As a widely accepted practice, you may also Disallow /wp-content/plugins/ folder as well to prevent crawling of unwanted plugin files.
- You should not use Robots.txt file to prevent crawling of your low-quality content. When I say content that particularly referring to the main content i.e. article, blog post, images, etc. of your site. If you really want to noindex some of your content then do it with the appropriate method. For instance, you can use the NoIndex tag provided in the Yoast SEO plugin.
- You must not insert any comma, colon or any other symbol which is not a part of the instruction.
- Do not add extra space between instructions unless it is done for a reason.
WordPress Robots.txt example
Following is the robots.txt file of NintendoHill at the time of writing this article.
You can use the following instructions in robots.txt file for your WordPress site with adequate changes like sitemap links etc.
Search Engine bots should only be allowed to crawl and index files which you can share publically.
Above mentioned Robots.txt file disallow Search Engine bots to access core files of website, which are not required for public.
Robots.txt @ Popular WordPress Blogs
How do they do it?
Like many other WordPress things, popular blogs have different opinions on robots.txt file too. In one of Yoast’s article, they explained the use of the very simple robots.txt file.
In another blog post, WPBeginner Team is suggesting the usage of robots.txt file with Allow and Disallow crawling directives for different parts of your site.
I have mentioned robots.txt file of Technumero.com above, you can use that on your WordPress site with necessary changes.
You can also see robots.txt of some these popular blogs and can make your mind, how you want to keep your site’s robots.txt file.
Test if Robost.txt is Working Fine
Hey! Robots.txt, How are you doing?
You can test if your site’s Robots.txt file is working fine or not. Also, you can check if robots.txt is blocking a specific URL of your site.
Open Google Webmasters Tools Dashboard of your site. Then go to Crawl>robots.txt Tester. Clicking robots.txt Tester will take you to the screen like shown below.
From here you see the content of your robots.txt file. And you can also see if there is some error or warning with robots.txt file.
Right below that there is an option to enter a specific URL of your site to check if the robots.txt is blocking a URL.
Using this option you can fetch a specific URL of your site as Googlebot, Googlebot-News, Googlebot-Image, Googlebot-Video, Googlebot-Mobile, Mediaparters-Google, Adbots-Google.
I hope this article helped you to understand Robots.txt file and to optimize the WordPress Robtots.txt file.
If you have a question or you just want to say hello, feel free to post it via comments below.
Testing Robots.txt File in Google Webmaster Tools
After updating your Robots.txt file, you have to test the Robots.txt file to check if any content is impacted by the update.
You can use Google Search Console to check if there is any “Error” or “Warning” for your Robots.txt file. Just login to Google Search Console and select the site. Then go to Crawl > robots.txt Tester and click on the “Submit” button.
A box will be popped up. Just click on “Submit” button.
Then reload the page and check if the file is updated. It might take some time to update the Robots.txt file.
If it hasn’t updated yet, you can enter your Robots.txt file code into the box to check if there are any errors or warnings. It will show the errors and warnings there.
If you notice any errors or warnings in the robots.txt file, you have to fix it by editing the robots.txt file.
Beware of the UTF-8 BOM
BOM stands for byte order mark and is basically an invisible character that is sometimes added to files by old text editors and the like.
If this happens to your robots.txt file, Google might not read it correctly.
This is why it is important to check your file for errors.
For example, as seen below, our file had an invisible character and Google complains about the syntax not being understood.
This essentially invalidates the first line of our robots.txt file altogether, which is not good! Glenn Gabe has an excellent article on how a UTF-8 Bom could kill your SEO.
Googlebot is Mostly US-Based
It’s also important not to block the Googlebot from the United States, even if you are targeting a local region outside of the United States.
They sometimes do local crawling, but the Googlebot is mostly US-based.
Googlebot is mostly US-based, but we also sometimes do local crawling. https://t.co/9KnmN4yXpe
— Google Webmasters (@googlewmc) November 13, 2017
Use Robots.txt The Right Way
As we wrap up our robots.txt guide, we want to remind you one more time than using a Disallow command in your robots.txt file is not the same as using a noindex tag.
Robots.txt blocks crawling, but not necessarily indexing.
You can use it to add specific rules to shape how search engines and other bots interact with your site, but it will not explicitly control whether your content is indexed or not.
For most casual WordPress users, there’s not an urgent need to modify the default virtual robots.txt file.
But if you’re having issues with a specific bot, or want to change how search engines interact with a certain plugin or theme that you’re using, you might want to add your own rules.
The goal of optimizing your robots.txt file is to prevent search engines from crawling pages that are not publicly available.
For example, pages in your wp-plugins folder or pages in your WordPress admin folder.
A common myth among SEO experts is that blocking the WordPress category, tags, and archive pages will improve the crawl rate and result in faster indexing and higher rankings.
This is not true. It’s also against Google’s webmaster guidelines.
We recommend that you follow the above robots.txt format to create a robots.txt file for your website.
That’s all we know for now :3
Remember that we constantly update this post like the others.
Hope this post helped you someway.
Thanks for reading.
Remember to share this post with your preferred social network and tell your followers how you find it.
Need help? comment below this post and we will contact you soon if possible.