How to Allow or Disallow Bots/Robots to Crawl Your Site

What is the Robots.txt File?

  • A sites Robots.txt file is the most powerful file your site will have.
  • This file controls how search engines crawl your site.
  • Every site should have one, even if it is a basic one.

The file sits in the root of the site and is the first file a search engine crawler will read before crawling the site.

With this file you can control which site crawls can crawl your site, which pages they can crawl and which pages they can’t.

Because of the nature of this file, it is highly important you understand the basics. Including an incorrect command and you could potentially stop search engines from crawling important areas of your site, or worse, you could stop them from indexing your site altogether.

Commands you should be aware of

There are a number of commands you need to be aware of so that you can set up your file safely.

Default file that allows full access:

User-agent: *

Disallow:

The star icon indicates the command includes all bots
Leaving the Disallow command empty means all bots are free to crawl the site.

To block

User-agent: *

Disallow: /

Again the star means this command includes all bots
By adding the forward slash we are telling the bots they are unable to crawl any part of the site.

This would result in none of your pages being indexed.

To block a specific bot:

User-agent: Bot Name

Disallow: /

By naming the bot we are saying that the following command is relevant to that bot only.
By including the forward slash we are again saying the bot is barred from crawling the whole site.

Blocking a folder

There are times when you may not want certain areas of your site crawled or indexed. This can also be controlled.

User-agent: *

Disallow: /folder/

The star again indicates that this is relevant to all bots
This time we have included /folder/ – this instructs the bots that we don’t want them to crawl anything within that folder.

Blocking a file

If there are individual files or pages you wish to block, then you can also do this:

User-agent: *

Disallow: /file.html

The star again indicates this is for all bots.
The Disallow file informs the bots that we want to block the page file.html from being crawled.

As you can see this can potentially be a very powerful file, so be sure to test any amends in your Search Console account. This will tell you if there are any errors and exactly what your commands will do.

Support Us By Sharing

Author: Aliva Tripathy

Taking out time from a housewife life and contributing to AxiBook is a passion for me. I love doing this and gets mind filled with huge satisfaction with thoughtful feedbacks from you all. Do love caring for others and love sharing knowledge more than this.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code