FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Alternate robots.txt files with Htaccess

So here's the basic idea: There are 2 sites, a development site and a live site. They are essentially mirrors of each other in terms of they have the same files. You need to disallow all search engine robots from indexing and crawling the development site, while allowing full crawling of your live site. Htaccess to the rescue!

Create a robots-off.txt

You already should have a robots.txt file, now you just need to create a robots-off.txt file in the same directory as the robots.txt file. This blocks all legitimate search engines.

User-agent: *
Disallow: /

Htaccess Rewrite for Alternate robots.txt

The below code is simple! It just checks the HTTP_HOST to see if it starts with "development", and if so (development.site.com) it internally rewrites (not redirects) requests for /robots.txt to /robots-off.txt

###
### Alt robots.txt ala askapache.com/htaccess/alternate-robots-txt-rewrite.html
###
RewriteCond %{HTTP_HOST} ^development.*$ [NC]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
RewriteRule ^robots\.txt /robots-off.txt [NC,L]

Htaccess Htaccess RewriteRule robots.txt 22 Feb, 201322 Feb, 2013

« phpMyAdmin Shortcuts with .htaccessShare a Mouse and Keyboard between Windows and Linux »

Comments