FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Alternate robots.txt files with Htaccess

robots-off-txtSo here's the basic idea: There are 2 sites, a development site and a live site. They are essentially mirrors of each other in terms of they have the same files. You need to disallow all search engine robots from indexing and crawling the development site, while allowing full crawling of your live site. Htaccess to the rescue!

Create a robots-off.txt

You already should have a robots.txt file, now you just need to create a robots-off.txt file in the same directory as the robots.txt file. This blocks all legitimate search engines.

User-agent: *
Disallow: /

Htaccess Rewrite for Alternate robots.txt

The below code is simple! It just checks the HTTP_HOST to see if it starts with "development", and if so (development.site.com) it internally rewrites (not redirects) requests for /robots.txt to /robots-off.txt

###
### Alt robots.txt ala askapache.com/htaccess/alternate-robots-txt-rewrite.html
###
RewriteCond %{HTTP_HOST} ^development.*$ [NC]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
RewriteRule ^robots\.txt /robots-off.txt [NC,L]

Htaccess Htaccess RewriteRule robots.txt

 

 

Comments