The .htaccess rules to block bad bots from crawling your website

Bots, also known as web crawlers or spiders, are automated software programs that traverse the internet and visit websites to collect information. These bots are commonly used by search engines like Google, Bing, and others to index web pages and gather data for their search engine databases.

Web crawlers work by following hyperlinks from one web page to another, systematically exploring and indexing the content they encounter. They typically analyze the HTML, CSS, and JavaScript code of web pages, extracting information such as text content, links, and metadata.

The primary purpose of web crawlers is to enable search engines to provide relevant and up-to-date search results to users. By regularly crawling and indexing web pages, search engines can understand the content of websites and rank them based on relevance and quality.

In addition to search engine crawlers, there are other types of bots or web crawlers employed for various purposes. Some bots are designed to scrape data from websites, monitor website changes, or perform automated tasks such as form submissions or data extraction. However, not all bots are well-intentioned, and there are instances of malicious bots that can engage in activities like web scraping, spamming, or launching attacks.

To manage the behavior of bots on a website, webmasters can use techniques such as robots.txt files, meta tags, and .htaccess rules to control access and specify how bots should interact with their site.

.htaccess rule to block bad bots and allow good bots to crawl the website

To allow only good bots to crawl your website and block the bad bots using the .htaccess file, you can use the following rules:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(MJ12Bot|Baiduspider|AhrefsBot).*$ [NC]
RewriteRule ^.*$ - [F,L]

In the above rules:

  • RewriteEngine On enables the Apache mod_rewrite engine.
  • RewriteCond is used to set a condition for the following RewriteRule.
  • %{HTTP_USER_AGENT} represents the user agent string sent by the browser or bot.
  • ^.*(bot1|bot2|bot3).*$ is a regular expression that matches any user agent string containing “bot1”, “bot2”, or “bot3”. You can add more bot names separated by pipes (|) within the parentheses.
  • [NC] makes the pattern comparison case-insensitive.
  • RewriteRule ^.*$ - [F,L] matches any URL and returns a 403 Forbidden status ([F]). The L flag indicates that the rule should be the last one applied.

Make sure to replace “MJ12Bot”, “Baiduspider”, and “AhrefsBot” with the actual names or patterns of the bad bots you want to block. Place these rules in the .htaccess file located in the root directory of your website. Only the bots specified in the RewriteCond will be blocked from crawling your website and receive a 403 Forbidden error.

Leave a Reply

Your email address will not be published. Required fields are marked *

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top