Getting more control over the search engines in a way you want can be a tough fight.
Really, with some hacks, you can easily handle search bots that crawl and index your website – even on the page level.
Here we are going to talk about a legitimate Search Engine Optimization hack that can improve your website SEO and easy to implement.
It’s the robots.txt file which is also known as “robots exclusion protocol” or “robots exclusion standard”.
Robots.txt supervise the availability of your web content to crawlers but doesn’t notify them to index or not to do so.
What Is Robots.txt File?
A robots.txt file allows search engine robots which web pages, directory, sub-directory, file, folder, or dynamic web pages the spiders can or can’t crawl from your website (like “follow” & “nofollow”). This robots exclusion protocol is usually used to access and index the web content and prevent website overloading with requests
It will not keep your web page out of search engines. In case you don’t want search engine robots to index your website. You can use noindex directives or protect your page with a password.
You can tell spiders to crawl or not for an individual part of the website and user-agent by specifying “disallowing” or “allowing”.
Location Of Robots.txt File
Keep your robots.txt file in the root directory of your domain or subdomain.
Search bots always look for robots.txt files. If any bot identifier is unable to find robots exclusion protocol file under the root directory or access the www.xyz.com/robots.txt URL, user-agent will consider that the website doesn’t have robots.txt file.
To locate robots exclusion standards, go to your cPanel >> public_html web directory.
Search engines’ directives/robots that visit the website. Major user-agents are- Google – googlebot Bing – bingbot Yahoo – Slurp Msn – msnbot DuckDuckGo – duckduckbot Baidu – baiduspider Yandex – yandexbot Facebook – facebot
Example User-agent: googlebot Disallow: /wp-admin/
In the example search bots are told not to crawl /wp-admin/ directory for Google.
Note: It is important to define user-agent(s) correctly to ensure search bots crawl your website.
Wildcard (*) Directive
This indicates that the directives are meant for all search engines. Wildcard characters can match any sequence of characters you want. It is an excellent approach for the same pattern URLs. Google & Bing bots support this wildcard.
Example User-agent: * Disallow: /plugin/ Disallow: *?
In the mentioned example crawlers are not able to crawl the plugin directory and the URLs which includes “question mark (?)”.
Disallow Directive
This robots.txt directive is used to specify which part of a website should not be accessed by all or any individual user-agent.
Here for Yahoo robots /services/ directory will not be crawled and for bing search engine spiders /keywords/ directory and all pdf files in /ebooks/ directory will not be crawled.
Allow Directive
Allow directive tell search robots to crawl a subdirectory or webpage – even if the main folder is disallowed. This directive is supported by Google and Bing. An accessible path is necessary to be accessed. If the path is not defined, the directive is ignored.
Example User-agent: * Allow: /blog/ Disallow: /blog/permanent-301-vs-temporary-302-redirects-which-one-is-better/
To specify the end of URL, use a dollar sign ($) at the end of the path. Google & Bing bots support this wildcard.
Example User-agent: * Disallow: /*.php$
This example shows that all search robots are disallowed to access all URL ends with .php. But crawlers can access URLs that do not end with .php such as https://xyz.com/services.php?lang=en.
Crawl-delay Directive
Crawl-delay directive is used to define “how many milliseconds a crawler waits before crawling the next web page.” Crawl-delay directive prevents server overloading with multiple requests at a time. Yahoo, Bing, and Yandex support this but Google doesn’t support crawl-delay. But one can set the crawl delay in Google Search Console.
Example User-agent: Slurp Crawl-delay: 5
Here you direct the crawler to wait for 5 seconds before crawling the next action
Sitemap Directive
Sitemap directive notifies the location of your XML sitemap to the search engines. However, if you have less knowledge about sitemaps than you can use google webmaster tool to submit each URL one by one.
Example User-agent: * Disallow: /media/ Sitemap: https://www.xyz.com/sitemap.xml
Note: Robots Exclusion Standard text file is supported by most search engines. But, you must know that some search engines do not support robots.txt files.
Why Is Robots.txt File Important?
Google generally crawls and indexes important pages of your website, but ignores the pages which are not important or have duplicate content. Robots.txt is not a mandatory aspect to create a successful website. You can rank well in search engines without a robots exclusion protocol.
Still here are some why you should include a robots.txt file –
To disallow web pages from appearing in Search Engine Result Pages which contain duplicate content.
To prevent search robots from crawling your private web folders.
To maximize the crawl budget by disallowing less important web pages with robots.txt.
To keep the entire section of a website away from search robots.
To specify the location of the sitemap.
Add crawl-delay to avoid server overloading from multiple requests at once.
Block images, videos, pdf, and resources files from occurring in search results
You can place the robots.txt file in the root directory of your website. The recommended location is – https://www.xyz.com/robots.txt
Note: Robots.txt file is case sensitive, ensure that you use lowercase “r” while naming the file.
Look For Errors And Mistakes
Setting up the correct robots.txt is EXTREMELY important, otherwise, your complete website could get deindexed.
Suppose, you are working on a multilingual website and editing the Spanish version under /es/ sub-directory. So, you like to avoid search engine robots from crawling it. Use the following robots.txt to disallow spiders from crawling that entire subdirectory.
Example Not Correct User-agent: * Disallow: /es
This will keep bots away from crawling any web page or subfolders begins with /es. For example – /essentials/ /escrow-services.html /essentials-services.pdf
Correct User-agent: Disallow: /es/
The simple solution is to resolve this – add a slash after the subdirectory name.
Take Advantage Of Comments To Explain Robots.txt File
Comments help developers and humans to understand the robots.txt file. To add a comment, start with (#).
Example # Disallow googlebot from crawling. User-agent: googlebot Disallow: /wp-admin/
Search bots will ignore /wp-admin/ directory for google spiders.
Utilize Each User-agent Only Once
Use one user-agent once in a robots.txt to avoid confusion for search engine spiders.
Example User-agent: bingbot Disallow: /blog/
User-agent: bingbot Disallow: /articles/
Bing would not crawl either /blog/ subfolder or /articles/ subdirectory.
Define Wildcards (*) To Streamline Instructions
You can use wildcards (*) directive to define all user-agents and the same URL patterns.
Example Not Correct User-agent: * Disallow: /services/seo=? Disallow: /services/smm=? Disallow: /services/smo=?
This is not an efficient way.
Correct User-agent: * Disallow: /services/*=?
With this one line directive, you can block search spiders from crawling all web pages under “/service/” directory followed by “=?”
Create Different Robots.txt File For Each Subdomain
If you have a subdomain, you need to create a separate robots.txt file and put that file into the subdomain directory.
Suppose if you create a blog subdomain such as blog.xyz.com; you should create a different robots exclusion standard file for that blog.
Take Care of Conflicting Rules
The first matching directive always wins in robots.txt. But for both Bing and Google, Allow directive can win over Disallow if the character length of Allow is longer.
Example User-agent: * Allow: /blog/seo/ Disallow: /blog/
Here Google and Bing bot are not permitted to crawl /blog/ directory, but they can crawl and index /blog/seo.
Example User-agent: * Disallow: /blog/ Allow: /blog/seo/
Here all bot identifiers are not permitted to crawl /blog/ directory with /blog/seo/ subdirectory. But as mentioned above, Google and Bing bots can access Allow directive, if it has more characters than Disallow directory.
In case, if both Allow and Disallow are equal in characters, the least restrictive directive gets crawled.
Limitations Of The Robots.txt File
Robots.txt file is not supported by all search engines
Robots.txt directives can’t force bots to crawl your website, it completely depends upon the crawler to follow them.
Different search bots treat syntax differently
Most web identifiers follow robots.txt file directives, but every search bot treats it in their way. You must know the exact syntax before using multiple web crawlers.
Disallowed page still appear in search results
Web pages which are disallowed in robots.txt, but still linked with an indexed web page. In case, you don’t want web spiders to index your website, you can use other methods such as noindex meta tag or protect your private files with a password.
By permitting robots.txt directives to crawl the right syntax, search crawlers organize and display your content in the way you want to make it appear in search engine results pages.
Robots.txt Frequently Asked Questions
Here are some FAQs related to robots.txt; in case you have any question or feedback, do comments. We will update accordingly.
Will search robots crawl a website that doesn’t have a robots.txt?
Yes, if search robots don’t find robots.txt file in the root directory, crawlers presume that there are no directives and access the entire website.
Will search robots crawl a website that doesn’t have a robots.txt?
500 KB
What will happen, if I use the noindex directive in robots.txt?
One must know that search engines never follow the “noindex” directive because they can not see the “noindex” syntax.
It’s really a cool and useful piece of information. I am satisfied
that you simply shared this useful info with us.
Please stay us up to date like this. Thanks for sharing.
Hello! Someone in my Facebook group shared this website
with us so I came to take a look. I’m definitely enjoying the information. I’m book-marking and
will be tweeting this to my followers! Fantastic blog and
terrific style and design.
Someboɗy essentially һelp to make ѕeverely articles I
might state. This is the ᴠery first time I frequented your websitе
page and up to now? I amazed with the analysis
you made to maҝe this pаrticular poѕt incredible. Fantastic task!
You made some good factors there. I viewed the net for the issue and discovered most people will certainly go along with with your web site. Camellia William Ladew
Awesome post. I am a normal visitor of your blog and appreciate you taking the time to maintain the excellent site. I will be a regular visitor for a really long time. Truda Lalo Howey
Howdy! I just wish to offer you a huge thumbs up for the excellent info you have right here on this post. I am returning to your website for more soon. Rheta Jozef Agustin
Great beat ! I would like to apprentice while you amend your site, how can i subscribe for a blog web site? The account helped me a acceptable deal. I had been a little bit acquainted of this your broadcast offered bright clear concept Karalynn Orton Etta
I don’t even know how I ended up here, but I thought this post was good.
I do not know who you are but definitely you’re going to a famous blogger
if you are not already 😉 Cheers!
I was more than happy to discover this page. I wanted to thank you for ones time for this wonderful read!!
I definitely really liked every part of it and I have you bookmarked
to look at new information in your web site.
We’re a group of volunteers and opening a new
scheme in our community. Your web site provided us with valuable info
to work on. You have done a
formidable job and our whole community will be grateful
to you.
I am no longer positive the place you’re getting your info, however
good topic. I needs to spend a while studying much more or understanding more.
Thank you for magnificent info I used to be searching for
this info for my mission.
This is very interesting, You are a very skilled blogger.
I have joined your rss feed and look forward to seeking more of your fantastic post.
Also, I have shared your web site in my social networks!
Your style is very unique in comparison to other people I have read
stuff from. Many thanks for posting when you’ve got
the opportunity, Guess I’ll just bookmark this site.
It’s really a cool and useful piece of information. I am satisfied
that you simply shared this useful info with us.
Please stay us up to date like this. Thanks for sharing.
Hello! Someone in my Facebook group shared this website
with us so I came to take a look. I’m definitely enjoying the information. I’m book-marking and
will be tweeting this to my followers! Fantastic blog and
terrific style and design.
Hurrah, that’s what I was searching for, what a stuff! present here at this webpage, thanks admin of this site.
Someboɗy essentially һelp to make ѕeverely articles I
might state. This is the ᴠery first time I frequented your websitе
page and up to now? I amazed with the analysis
you made to maҝe this pаrticular poѕt incredible. Fantastic task!
There is noticeably a bundle to find out about this. I assume you made sure good factors in features also. Dido Zebadiah Hatcher
You made some good factors there. I viewed the net for the issue and discovered most people will certainly go along with with your web site. Camellia William Ladew
Great post! We are linking to this great article on our site. Keep up the great writing. Babb Niel Berkie
I regard something truly special in this web site. Celestina Harlin Dorin
Outstanding story there. What occurred after? Good luck! Christi Bartholomew Liatrice
Awesome post. I am a normal visitor of your blog and appreciate you taking the time to maintain the excellent site. I will be a regular visitor for a really long time. Truda Lalo Howey
very nice publish, i definitely love this website, keep on it Debbi Millard Kluge
Howdy! I just wish to offer you a huge thumbs up for the excellent info you have right here on this post. I am returning to your website for more soon. Rheta Jozef Agustin
There is apparently a lot to realize about this. I suppose you made various nice points in features also. Zonda Bentley Brunhilde
Great beat ! I would like to apprentice while you amend your site, how can i subscribe for a blog web site? The account helped me a acceptable deal. I had been a little bit acquainted of this your broadcast offered bright clear concept Karalynn Orton Etta
I don’t even know how I ended up here, but I thought this post was good.
I do not know who you are but definitely you’re going to a famous blogger
if you are not already 😉 Cheers!
Right away I am going to do my breakfast, once having my breakfast coming over again to read more news.
Thanks for the article post.Really thank you! Fantastic.
I was more than happy to discover this page. I wanted to thank you for ones time for this wonderful read!!
I definitely really liked every part of it and I have you bookmarked
to look at new information in your web site.
I have found excellent messages here. I love the method you explain it.
Great!
We’re a group of volunteers and opening a new
scheme in our community. Your web site provided us with valuable info
to work on. You have done a
formidable job and our whole community will be grateful
to you.
I like your posts, I will stay gotten in touch with your
blog site for future short articles.
I am no longer positive the place you’re getting your info, however
good topic. I needs to spend a while studying much more or understanding more.
Thank you for magnificent info I used to be searching for
this info for my mission.
This is very interesting, You are a very skilled blogger.
I have joined your rss feed and look forward to seeking more of your fantastic post.
Also, I have shared your web site in my social networks!
Your style is very unique in comparison to other people I have read
stuff from. Many thanks for posting when you’ve got
the opportunity, Guess I’ll just bookmark this site.