Getting more control over the search engines in a way you want can be a tough fight.

Really, with some hacks, you can easily handle search bots that crawl and index your website – even on the page level.

Here we are going to talk about a legitimate Search Engine Optimization hack that can improve your website SEO and easy to implement.

It’s the robots.txt file which is also known as “robots exclusion protocol” or “robots exclusion standard”.

Robots.txt supervise the availability of your web content to crawlers but doesn’t notify them to index or not to do so.

What Is Robots.txt File?

A robots.txt file allows search engine robots which web pages, directory, sub-directory, file, folder, or dynamic web pages the spiders can or can’t crawl from your website (like “follow” & “nofollow”). This robots exclusion protocol is usually used to access and index the web content and prevent website overloading with requests

It will not keep your web page out of search engines. In case you don’t want search engine robots to index your website. You can use noindex directives or protect your page with a password.

You can tell spiders to crawl or not for an individual part of the website and user-agent by specifying “disallowing” or “allowing”.

Location Of Robots.txt File

Keep your robots.txt file in the root directory of your domain or subdomain.

Search bots always look for robots.txt files. If any bot identifier is unable to find robots exclusion protocol file under the root directory or access the www.xyz.com/robots.txt URL, user-agent will consider that the website doesn’t have robots.txt file.

To locate robots exclusion standards, go to your cPanel >> public_html web directory.

Basic Format Of Robots.txt File

User-agent: *
Allow: /media/terms-and-conditions.pdf
Disallow: /wp-admin/
Disallow: *.php$
Crawl-delay: 5
Sitemap: https://www.xyz.com/sitemap.xml

User-agent Directive

Search engines’ directives/robots that visit the website. Major user-agents are-
Google – googlebot
Bing – bingbot
Yahoo – Slurp
Msn – msnbot
DuckDuckGo – duckduckbot
Baidu – baiduspider
Yandex – yandexbot
Facebook – facebot

Example
User-agent: googlebot
Disallow: /wp-admin/

In the example search bots are told not to crawl /wp-admin/ directory for Google.

Note: It is important to define user-agent(s) correctly to ensure search bots crawl your website.

Wildcard (*) Directive

This indicates that the directives are meant for all search engines. Wildcard characters can match any sequence of characters you want. It is an excellent approach for the same pattern URLs. Google & Bing bots support this wildcard.

Example
User-agent: *
Disallow: /plugin/
Disallow: *?

In the mentioned example crawlers are not able to crawl the plugin directory and the URLs which includes “question mark (?)”.

Disallow Directive

This robots.txt directive is used to specify which part of a website should not be accessed by all or any individual user-agent.

Example
User-agent: Slurp
Disallow: /services/

User-agent: bingbot
Disallow: /ebooks/* .pdf
Disallow: /keywords/

Here for Yahoo robots /services/ directory will not be crawled and for bing search engine spiders /keywords/ directory and all pdf files in /ebooks/ directory will not be crawled.

Allow Directive

Allow directive tell search robots to crawl a subdirectory or webpage – even if the main folder is disallowed. This directive is supported by Google and Bing. An accessible path is necessary to be accessed. If the path is not defined, the directive is ignored.

Example
User-agent: *
Allow: /blog/
Disallow: /blog/permanent-301-vs-temporary-302-redirects-which-one-is-better/

/blog/ directory will be crawled by bots but ‘permanent-301-vs-temporary-302-redirects-which-one-is-better‘ will be disallowed by robots.

Wildcards ($) Directive

To specify the end of URL, use a dollar sign ($) at the end of the path. Google & Bing bots support this wildcard.

Example
User-agent: *
Disallow: /*.php$

This example shows that all search robots are disallowed to access all URL ends with .php. But crawlers can access URLs that do not end with .php such as https://xyz.com/services.php?lang=en.

Crawl-delay Directive

Crawl-delay directive is used to define “how many milliseconds a crawler waits before crawling the next web page.” Crawl-delay directive prevents server overloading with multiple requests at a time. Yahoo, Bing, and Yandex support this but Google doesn’t support crawl-delay. But one can set the crawl delay in Google Search Console.

Example
User-agent: Slurp
Crawl-delay: 5

Here you direct the crawler to wait for 5 seconds before crawling the next action

Sitemap Directive

Sitemap directive notifies the location of your XML sitemap to the search engines. However, if you have less knowledge about sitemaps than you can use google webmaster tool to submit each URL one by one.

Example
User-agent: *
Disallow: /media/
Sitemap: https://www.xyz.com/sitemap.xml

Note: Robots Exclusion Standard text file is supported by most search engines. But, you must know that some search engines do not support robots.txt files.

Why Is Robots.txt File Important?

Google generally crawls and indexes important pages of your website, but ignores the pages which are not important or have duplicate content. Robots.txt is not a mandatory aspect to create a successful website. You can rank well in search engines without a robots exclusion protocol.

Still here are some why you should include a robots.txt file –

To disallow web pages from appearing in Search Engine Result Pages which contain duplicate content.
To prevent search robots from crawling your private web folders.
To maximize the crawl budget by disallowing less important web pages with robots.txt.
To keep the entire section of a website away from search robots.
To specify the location of the sitemap.
Add crawl-delay to avoid server overloading from multiple requests at once.
Block images, videos, pdf, and resources files from occurring in search results

Permanent 301 vs Temporary 302 Redirects – Which one is Better?

What Are The Best Practices For Robots.txt File?

Create A Robots.txt File

As the robots exclusion standard is a text file, you can create one using the notepad or notepad++.

New Line For Each Directive

To avoid confusion for search engines use different lines for each directive.

Example
Not Correct
User-agent: * Disallow: /wp-admin/ Disallow: /wp-admin-new/

Correct
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-admin-new/

Make Robots.txt Easy To Find

You can place the robots.txt file in the root directory of your website. The recommended location is – https://www.xyz.com/robots.txt

Note: Robots.txt file is case sensitive, ensure that you use lowercase “r” while naming the file.

Look For Errors And Mistakes

Setting up the correct robots.txt is EXTREMELY important, otherwise, your complete website could get deindexed.

Suppose, you are working on a multilingual website and editing the Spanish version under /es/ sub-directory. So, you like to avoid search engine robots from crawling it. Use the following robots.txt to disallow spiders from crawling that entire subdirectory.

Example
Not Correct
User-agent: *
Disallow: /es

This will keep bots away from crawling any web page or subfolders begins with /es. For example –
/essentials/
/escrow-services.html
/essentials-services.pdf

Correct
User-agent:
Disallow: /es/

The simple solution is to resolve this – add a slash after the subdirectory name.

Take Advantage Of Comments To Explain Robots.txt File

Comments help developers and humans to understand the robots.txt file. To add a comment, start with (#).

Example
# Disallow googlebot from crawling.
User-agent: googlebot
Disallow: /wp-admin/

Search bots will ignore /wp-admin/ directory for google spiders.

Utilize Each User-agent Only Once

Use one user-agent once in a robots.txt to avoid confusion for search engine spiders.

Example
User-agent: bingbot
Disallow: /blog/

User-agent: bingbot
Disallow: /articles/

Bing would not crawl either /blog/ subfolder or /articles/ subdirectory.

Define Wildcards (*) To Streamline Instructions

You can use wildcards (*) directive to define all user-agents and the same URL patterns.

Example
Not Correct
User-agent: *
Disallow: /services/seo=?
Disallow: /services/smm=?
Disallow: /services/smo=?

This is not an efficient way.

Correct
User-agent: *
Disallow: /services/*=?

With this one line directive, you can block search spiders from crawling all web pages under “/service/” directory followed by “=?”

Create Different Robots.txt File For Each Subdomain

If you have a subdomain, you need to create a separate robots.txt file and put that file into the subdomain directory.

Suppose if you create a blog subdomain such as blog.xyz.com; you should create a different robots exclusion standard file for that blog.

Take Care of Conflicting Rules

The first matching directive always wins in robots.txt. But for both Bing and Google, Allow directive can win over Disallow if the character length of Allow is longer.

Example
User-agent: *
Allow: /blog/seo/
Disallow: /blog/

Here Google and Bing bot are not permitted to crawl /blog/ directory, but they can crawl and index /blog/seo.

Example
User-agent: *
Disallow: /blog/
Allow: /blog/seo/

Here all bot identifiers are not permitted to crawl /blog/ directory with /blog/seo/ subdirectory. But as mentioned above, Google and Bing bots can access Allow directive, if it has more characters than Disallow directory.

In case, if both Allow and Disallow are equal in characters, the least restrictive directive gets crawled.

Limitations Of The Robots.txt File

Robots.txt file is not supported by all search engines

Robots.txt directives can’t force bots to crawl your website, it completely depends upon the crawler to follow them.

Different search bots treat syntax differently

Most web identifiers follow robots.txt file directives, but every search bot treats it in their way. You must know the exact syntax before using multiple web crawlers.

Disallowed page still appear in search results

Web pages which are disallowed in robots.txt, but still linked with an indexed web page. In case, you don’t want web spiders to index your website, you can use other methods such as noindex meta tag or protect your private files with a password.

By permitting robots.txt directives to crawl the right syntax, search crawlers organize and display your content in the way you want to make it appear in search engine results pages.

Robots.txt Frequently Asked Questions

Here are some FAQs related to robots.txt; in case you have any question or feedback, do comments. We will update accordingly.

Will search robots crawl a website that doesn’t have a robots.txt?

Yes, if search robots don’t find robots.txt file in the root directory, crawlers presume that there are no directives and access the entire website.

Will search robots crawl a website that doesn’t have a robots.txt?

500 KB

What will happen, if I use the noindex directive in robots.txt?

One must know that search engines never follow the “noindex” directive because they can not see the “noindex” syntax.

Previous : Google Consider Nofollow Link As A Hint For Crawling And Indexing

seo services india

July 7, 2020 at 2:51 pm

It’s really a cool and useful piece of information. I am satisfied
that you simply shared this useful info with us.
Please stay us up to date like this. Thanks for sharing.

Freelancer work

July 27, 2020 at 11:25 am

Hello! Someone in my Facebook group shared this website
with us so I came to take a look. I’m definitely enjoying the information. I’m book-marking and
will be tweeting this to my followers! Fantastic blog and
terrific style and design.

best phone

August 2, 2020 at 9:41 am

Hurrah, that’s what I was searching for, what a stuff! present here at this webpage, thanks admin of this site.

operable

August 22, 2020 at 11:50 pm

Someboɗy essentially һelp to make ѕeverely articles I
might state. This is the ᴠery first time I frequented your websitе
page and up to now? I amazed with the analｙsis
you made to maҝe this pаrticular poѕt incredible. Fantastic task!

türkçe izle

December 10, 2020 at 12:42 am

There is noticeably a bundle to find out about this. I assume you made sure good factors in features also. Dido Zebadiah Hatcher

altyazili

January 13, 2021 at 4:10 am

You made some good factors there. I viewed the net for the issue and discovered most people will certainly go along with with your web site. Camellia William Ladew

filmi full izle

January 13, 2021 at 2:11 pm

Great post! We are linking to this great article on our site. Keep up the great writing. Babb Niel Berkie

celebrities

January 21, 2021 at 4:47 pm

I regard something truly special in this web site. Celestina Harlin Dorin

turkce

January 31, 2021 at 5:36 pm

Outstanding story there. What occurred after? Good luck! Christi Bartholomew Liatrice

canli tv

January 31, 2021 at 6:19 pm

Awesome post. I am a normal visitor of your blog and appreciate you taking the time to maintain the excellent site. I will be a regular visitor for a really long time. Truda Lalo Howey

diziler

February 2, 2021 at 10:35 am

very nice publish, i definitely love this website, keep on it Debbi Millard Kluge

yetiskin

February 2, 2021 at 2:13 pm

Howdy! I just wish to offer you a huge thumbs up for the excellent info you have right here on this post. I am returning to your website for more soon. Rheta Jozef Agustin

mp3

February 8, 2021 at 1:28 pm

There is apparently a lot to realize about this. I suppose you made various nice points in features also. Zonda Bentley Brunhilde

hindi movie

February 10, 2021 at 11:57 am

Great beat ! I would like to apprentice while you amend your site, how can i subscribe for a blog web site? The account helped me a acceptable deal. I had been a little bit acquainted of this your broadcast offered bright clear concept Karalynn Orton Etta

shell indir

April 9, 2021 at 1:05 am

I don’t even know how I ended up here, but I thought this post was good.
I do not know who you are but definitely you’re going to a famous blogger
if you are not already 😉 Cheers!

tiktok 1k takipçi satin al

April 23, 2021 at 11:28 pm

Right away I am going to do my breakfast, once having my breakfast coming over again to read more news.

cbd gummies for sale

June 30, 2021 at 11:16 am

Thanks for the article post.Really thank you! Fantastic.

instegram takipçi satın al

August 18, 2021 at 7:58 am

I was more than happy to discover this page. I wanted to thank you for ones time for this wonderful read!!
I definitely really liked every part of it and I have you bookmarked
to look at new information in your web site.

The Stainless Steel Store Website

October 4, 2021 at 8:18 pm

I have found excellent messages here. I love the method you explain it.

Great!

site

October 5, 2021 at 2:44 pm

We’re a group of volunteers and opening a new
scheme in our community. Your web site provided us with valuable info
to work on. You have done a
formidable job and our whole community will be grateful
to you.

Dominic

October 6, 2021 at 7:16 pm

I like your posts, I will stay gotten in touch with your
blog site for future short articles.

https://tworivertimes.com/en/hotmail-login/

October 23, 2021 at 5:57 pm

I am no longer positive the place you’re getting your info, however
good topic. I needs to spend a while studying much more or understanding more.
Thank you for magnificent info I used to be searching for
this info for my mission.

forextradingsecrets.space

March 2, 2022 at 8:17 am

This is very interesting, You are a very skilled blogger.
I have joined your rss feed and look forward to seeking more of your fantastic post.
Also, I have shared your web site in my social networks!

https://stopacne.website/

March 11, 2022 at 7:04 am

Your style is very unique in comparison to other people I have read
stuff from. Many thanks for posting when you’ve got
the opportunity, Guess I’ll just bookmark this site.

Best Practices To Create A Perfect Robots.txt For SEO

What Is Robots.txt File?

Location Of Robots.txt File

Basic Format Of Robots.txt File

User-agent Directive

Wildcard (*) Directive

Disallow Directive

Allow Directive

Wildcards ($) Directive

Crawl-delay Directive

Sitemap Directive

Why Is Robots.txt File Important?

Permanent 301 vs Temporary 302 Redirects – Which one is Better?

What Are The Best Practices For Robots.txt File?

Create A Robots.txt File

New Line For Each Directive

Make Robots.txt Easy To Find

Look For Errors And Mistakes

Take Advantage Of Comments To Explain Robots.txt File

Utilize Each User-agent Only Once

Define Wildcards (*) To Streamline Instructions

Create Different Robots.txt File For Each Subdomain

Take Care of Conflicting Rules

Limitations Of The Robots.txt File

Robots.txt file is not supported by all search engines

Different search bots treat syntax differently

Disallowed page still appear in search results

Robots.txt Frequently Asked Questions

Will search robots crawl a website that doesn’t have a robots.txt?

Will search robots crawl a website that doesn’t have a robots.txt?

What will happen, if I use the noindex directive in robots.txt?

24 thoughts on “Best Practices To Create A Perfect Robots.txt For SEO”

Leave a Comment Cancel Reply