Skip to main content

Deny access to website, but allow robots.txt

ยท One min read

I had a problem where Googlebot was indexing a development site, so we locked it down using apache basic http auth. Now Googlebot was being served with a 401 when accessing the site, but because it had no stored robots.txt it was persistently trying to crawl the site.

Using the following allows anyone to access robots.txt but denies access to the rest of the site:

<Directory "/home/username/www">
AuthUserFile /home/username/.htpasswd
AuthName "Client Access"
AuthType Basic
require valid-use

<Files "robots.txt">
AuthType Basic
satisfy any
</Files>
</Directory>

Eventually Googlebot will get the hint and stop indexing the site and we can remove existing content using webmaster tools.