Dealing with Mobile Visitors using Bad Browsers
Definately worth the read, but it is hard to see any benefit to doing this.
Mobile agents are still in their infancy, but within 2 years most mobiles will be as fast as the laptops from a couple years back... I mean my droid is running linux! The world is moving steadily towards a society where mobile devices will greatly outnumber pcs. So during this "growing up" phase I would argue that it would be much more beneficial to look for a method that solves the resource-robbing issue from the server-side, while also keeping in mind that mobile visitors to your site will continue to grow and eventually surpass non-mobile clients.
It definately makes it harder to understand the reasons behind this clever post without having more information on the mobile bots that you've been seeing in your logs.. Obviously (looking at your impressive blacklist work) that would be easy for you to get, but it would help us to see the same thing.
Alot of mobile devices have very small amounts of memory, especially smal is the amount of storage available to save data too. In approaching that significant problem from a programming point of view, programmers built the mobile user-agents to be as fast as possible using minimal data. Knowing that it makes sense for a mobile agent to try hard to find an alternate version of a page formatted with it's unique lack of resources in mind. Issuing 100 requests for non-existant pages and only finding the right one on the 100th try would almost certainly be worth it for a mobile device. Most devices use socket programming to communicate across HTTP for speed, which makes it very quick and easy to issue requests.. Basically they are free in terms of what it takes to make a request. unfortunately this would really take up some cpu/memory/connections on our servers if they aren't setup and optimized. 1000 mobiles doing this simultaneously would grind most sites to a halt.
The solution you came up with would definately help that situation. 403's are the strongest method available to a server (at least in terms of the HTTP protocol) to tell a useragent to get gone. They are also the best way (at least in apache) to save your cpu/resources as a 403 causes apache internally to end the connection, clean up it's internal data structures, and terminate the actual connection and apache processing of the request.
However 403's are too strict for a situation without any clear abuse going on, 403's are understood by all agents and can do some bad things to your site if used like this. You can get dropped from search indexes for returning 403's (thats google trying to do you a favor by not indexing "Forbidden" content), and I've found that returning a 403 to crawlers causes them to sometimes retry in 15min, then an hour, then a day, then a week, and the spaces between checks grow until they stop.
Oh ya, it is very unlikely that a mobile device will save the results of non-existing mobile uris, mostly because it doesn't cost them anything to make it (unless you setup a trap like mod_security that lets you respond byte by byte veryyyyy slowwwwlyyyy). And even then mobile devices do not have that kind of memory to store lists of requested urls and their responses. Think about it, to check if the url returned a bad result previously before making the same request would very very quickly freeze up a device, 50 sites x 5 requests and responses equals quite a bit of data, not to mention having to then search through all that data before making a request.. the battery would die super fast.
This is also the primary reason that the new AMAZINGLY fast opera browser released last month for the droid does what it does. It uses socket-level HTTP like everyone else, but opera setup mobile proxy servers around the nation to act as the intermediarys and crunch the actual data for the mobile. There just isn't enough mem for my droid to be able to open a huge webpage, parse the source, and then render it, so it looks for mobile versions whenever possible. If it can't find a mobile version or the mobile version is still too big, it proxies the request across a mobile proxy server (such as used by google, opera, blackberry) which allows the proxy server (super sophisticated) to get the content first, render it, and then send it to your mobile for direct viewing. More than proxy servers they act as caches. And especially due to the fact they all use custom programming (the proxies) you do not want to play around with HTTP 403's like that. It could easily have the effect of blocking a root proxy resulting in your site being blocked by the entire proxy and it's clients. Unlike mobiles, those machines store request state info extremely well.
Regarding a 410, that seems like a great solution but actually could be the worst possible thing to do. 410 gone means it used to exist, and also means that it was removed purposefully and will NEVER be available again. 2 years from now when googles mobile index takes over the main web index, you will be upstream without a paddle, with no clue as to why your new mobile area isn't getting traffic.
Very few useragents understand a 410, it's one of those codes used almost exclusively for controlling the way search engines index your content. So to me it makes no sense to issue an esoteric status code to a bot that doesn't even understand 404's.
The only time you should ever have to use a 410 is when you make a big mistake with your indexing and have to use it to fix your site index. Many other useragents have minimal understanding of HTTP (esp bots, crawlers, spammers, etc) either by design for speed or whatever.. they just look at the first digit of the response code (2 OK, 3 REDIRECT, 4 NOT EXIST, 5 SERVER ERROR) and determine from that alone whether the content is good or not.
Basically all mobile devices run on HTTP 1.1, but for their own physical limitations they behave like HTTP 1.0 clients from a server admin standpoint.
« Magic in the Terminal: Screen, Bash, and SSHForce Flash to show up top »
Comments