This article shows 2 methods of blocking this entire list of bad robots and web scrapers with .htaccess files using SetEnvIfNoCase or using RewriteRules with mod_rewrite
Blocking Bad Robots and Web Scrapers with RewriteRules
ErrorDocument 403 /403.html
RewriteEngine On
RewriteBase /
# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC,OR]
# STARTS WITH WEB
RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]
# ANYWHERE IN UA -- GREEDY REGEX
RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]
# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]
Alternate RewriteCond Rules
RewriteEngine on
#Block spambots
RewriteCond %{HTTP:User-Agent} (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|
BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|
CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|
eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|
EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|
Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|
InfonaviRobot|InterGET|InternetsNinja|Jennybot|JetCar|JOCsWebsSpider|Kenjin.Spider|Keyword.Density|
larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|
MasssDownloader|Mata.Hari|Microsoft.URL|MIDownstool|MIIxpc|Mister.PiX|MistersPiX|moget|
Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|NetsVampire|
NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|OfflinesExplorer|OfflinesNavigator|Openfind|
Pagerabber|PapasFoto|pavuk|pcBrowser|ProgramsSharewares1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|
psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|
Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|TeleportsPro|Telesoft|The.Intraformant|
TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|
Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|
WebFetch|WebGosIS|Web.Image.Collector|WebsImagesCollector|WebLeacher|WebmasterWorldForumbot|
WebReaper|WebSauger|WebsiteseXtractor|Website.Quester|WebsitesQuester|Webster.Pro|WebStripper|
WebsSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|
XaldonsWebSpider|Xenu's|Zeus) [NC]
RewriteRule .? - [F]
Block Bad Bots with SetEnvIfNoCase
ErrorDocument 403 /403.html
# IF THE UA STARTS WITH THESE
SetEnvIfNoCase ^User-Agent$ .*(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT
Original Bad Bot / Web Scraper List
WebBandit
2icommerce
Accoona
ActiveTouristBot
adressendeutschland
aipbot
Alexibot
Alligator
AllSubmitter
almaden
anarchie
Anonymous
Apexoo
Aqua_Products
asterias
ASSORT
ATHENS
AtHome
Atomz
attache
autoemailspider
autohttp
b2w
bew
BackDoorBot
Badass
Baiduspider
Baiduspider+
BecomeBot
berts
Bitacle
Biz360
Black.Hole
BlackWidow
bladder fusion
Blog Checker
BlogPeople
Blogshares Spiders
Bloodhound
BlowFish
Board Bot
Bookmark search tool
BotALot
BotRightHere
Bot mailto:craftbot@yahoo.com
Bropwers
Browsezilla
BuiltBotTough
Bullseye
BunnySlippers
Cegbfeieh
CFNetwork
CheeseBot
CherryPicker
Crescent
charlotte/
ChinaClaw
Convera
Copernic
CopyRightCheck
cosmos
Crescent
c-spider
curl
Custo
Cyberz
DataCha0s
Daum
Deweb
Digger
Digimarc
digout4uagent
DIIbot
DISCo
DittoSpyder
DnloadMage
Download
dragonfly
DreamPassport
DSurf
DTS Agent
dumbot
DynaWeb
e-collector
EasyDL
EBrowse
eCatch
ecollector
edgeio
efp@gmx.net
EirGrabber
Email Extractor
EmailCollector
EmailSiphon
EmailWolf
EmeraldShield
Enterprise_Search
EroCrawler
ESurf
Eval
Everest-Vulcan
Exabot
Express
Extractor
ExtractorPro
EyeNetIE
FairAd
fastlwspider
fetch
FEZhead
FileHound
findlinks
Flaming AttackBot
FlashGet
FlickBot
Foobot
Forex
Franklin Locator
FreshDownload
FrontPage
FSurf
Gaisbot
Gamespy_Arcade
genieBot
GetBot
Getleft
GetRight
GetWeb!
Go!Zilla
Go-Ahead-Got-It
GOFORITBOT
GrabNet
Grafula
grub
Harvest
Hatena Antenna
heritrix
HLoader
HMView
holmes
HooWWWer
HouxouCrawler
HTTPGet
httplib
HTTPRetriever
HTTrack
humanlinks
IBM_Planetwide
iCCrawler
ichiro
iGetter
Image Stripper
Image Sucker
imagefetch
imds_monitor
IncyWincy
Industry Program
Indy
InetURL
InfoNaviRobot
InstallShield DigitalWizard
InterGET
IRLbot
Iron33
ISSpider
IUPUI Research Bot
Jakarta
java/
JBH Agent
JennyBot
JetCar
jeteye
jeteyebot
JoBo
JOC Web Spider
Kapere
Kenjin
Keyword Density
KRetrieve
ksoap
KWebGet
LapozzBot
larbin
leech
LeechFTP
LeechGet
leipzig.de
LexiBot
libWeb
libwww-FM
libwww-perl
LightningDownload
LinkextractorPro
Linkie
LinkScan
linktiger
LinkWalker
lmcrawler
LNSpiderguy
LocalcomBot
looksmart
LWP
Mac Finder
Mail Sweeper
mark.blonin
MaSagool
Mass
Mata Hari
MCspider
MetaProducts Download Express
Microsoft Data Access
Microsoft URL Control
MIDown
MIIxpc
Mirror
Missauga
Missouri College Browse
Mister
Monster
mkdb
moget
Moreoverbot
mothra/netscan
MovableType
Mozi!
Mozilla/22
Mozilla/3.0 (compatible)
Mozilla/5.0 (compatible; MSIE 5.0)
MSIE_6.0
MSIECrawler
MSProxy
MVAClient
MyFamilyBot
MyGetRight
nameprotect
NASA Search
Naver
Navroad
NearSite
NetAnts
netattache
NetCarta
NetMechanic
NetResearchServer
NetSpider
NetZIP
Net Vampire
NEWT ActiveX
Nextopia
NICErsPRO
ninja
NimbleCrawler
noxtrumbot
NPBot
Octopus
Offline
OK Mozilla
OmniExplorer
OpaL
Openbot
Openfind
OpenTextSiteCrawler
Oracle Ultra Search
OutfoxBot
P3P
PackRat
PageGrabber
PagmIEDownload
panscient
Papa Foto
pavuk
pcBrowser
perl
PerMan
PersonaPilot
PHP version
PlantyNet_WebRobot
playstarmusic
Plucker
Port Huron
Program Shareware
Progressive Download
ProPowerBot
prospector
ProWebWalker
Prozilla
psbot
psycheclone
puf
PushSite
PussyCat
PuxaRapido
Python-urllib
QuepasaCreep
QueryN
Radiation
RealDownload
RedCarpet
RedKernel
ReGet
relevantnoise
RepoMonkey
RMA
Rover
Rsync
RTG30
Rufus
SAPO
SBIder
scooter
ScoutAbout
script
searchpreview
searchterms
Seekbot
Serious
Shai
shelob
Shim-Crawler
SickleBot
sitecheck
SiteSnagger
Slurpy Verifier
SlySearch
SmartDownload
sna-
snagger
Snoopy
sogou
sootle
So-net” bat_bot
SpankBot” bat_bot
spanner” bat_bot
SpeedDownload
Spegla
Sphere
Sphider
SpiderBot
sproose
SQ Webscanner
Sqworm
Stamina
Stanford
studybot
SuperBot
SuperHTTP
Surfbot
SurfWalker
suzuran
Szukacz
tAkeOut
TALWinHttpClient
tarspider
Teleport
Telesoft
Templeton
TestBED
The Intraformant
TheNomad
TightTwatBot
Titan
toCrawl/UrlDispatcher
True_Robot
turingos
TurnitinBot
Twisted PageGetter
UCmore
UdmSearch
UMBC
UniversalFeedParser
URL Control
URLGetFile
URLy Warning
URL_Spider_Pro
UtilMind
vayala
vobsub
VCI
VoidEYE
VoilaBot
voyager
w3mir
Web Image Collector
Web Sucker
Web2WAP
WebaltBot
WebAuto
WebBandit
WebCapture
webcollage
WebCopier
WebCopy
WebEMailExtrac
WebEnhancer
WebFetch
WebFilter
WebFountain
WebGo
WebLeacher
WebMiner
WebMirror
WebReaper
WebSauger
WebSnake
Website
WebStripper
WebVac
webwalk
WebWhacker
WebZIP
Wells Search
WEP Search 00
WeRelateBot
Wget
WhosTalking
Widow
Wildsoft Surfer
WinHttpRequest
WinHTTrack
WUMPUS
WWWOFFLE
wwwster
WWW-Collector
Xaldon
Xenu's
Xenus
XGET
Y!TunnelPro
YahooYSMcm
YaDirectBot
Yeti
Zade
ZBot
zerxbot
Zeus
ZyBorg
Blocking Bad Bots and Scrapers with .htaccess - AskApache