Blocking Googlebot/MSNbot/Bingbot in Apache configuration from accessing your server/site

Want to help support this blog? Try out Oh Dear, the best all-in-one monitoring tool for your entire website, co-founded by me (the guy that wrote this blogpost). Start with a 10-day trial, no strings attached.

We offer uptime monitoring, SSL checks, broken links checking, performance & cronjob monitoring, branded status pages & so much more. Try us out today!

Profile image of Mattias Geniar

Mattias Geniar, February 15, 2012

Follow me on Twitter as @mattiasgeniar

For some reason, you may want to deny access to searchbots on a particular server to prevent a searchbot-run crippling your server. While this is by no means a pretty solution, it will block a lot of requests that have a User-Agent that contains the strings “GoogleBot”, “bingbot” or “msnbot”. Since User Agents can be faked, you may block innocent visitors, but you’ll block a large portion of the search crawlers.

Add the following to your general httpd.conf.

<Location />
	SetEnvIf User-Agent "msnbot" BlockUA
	SetEnvIf User-Agent "bingbot" BlockUA
	SetEnvIf User-Agent "Googlebot" BlockUA
	
	Order allow,deny
	Allow from all
	Deny from env=BlockUA
</Location>

If changing Apache configurations isn’t an option, you can simply place the above code in a .htaccess file as well if your host allows it, just without the blocks.

Everyone browsing to your server with a User-Agent containing any of the above strings will get an Http Status 403 (Forbidden) when browsing the site. It won’t cause any Apache load any more, but you don’t want this in production as no indexing is definitely a bad thing for your site and it will hurt your search engine results.



Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.