First let me say that I don't read SEO stuff, I don't participate in the SEO community, I only have an interest in regards to the technology used by the search engines. All websites are hosted on servers, mostly Apache, and that is the primary topic of this blog. During the past year of this blog, my research into non-seo areas has turned up some very valuable SEO techniques.. All of them legal, ethical, and genuinely good for the Internet at large.
Update: SEO Secrets part II: Advanced Indexing and Pagerank Control
I started this blog in late 2006, my first foray into blogging, and I've been extremely successful at achieving top ten google rankings and maintaining on average 15K unique visitors/day (per google analytics) 85% of which come from search engine traffic.
NOTE: I take it for granted that anyone reading AskApache is an expert of some skill, if you aren't I apologize, I can't waste time on the easy stuff.
There are literally hundreds of thousands of SEO articles on the net, 99.9% of which are absolute garbage. Especially in the sense that they just repeat the same 10 year old stuff. However, to do any kind of advanced SEO like I am going to discuss in this article, I am assuming that you, the intelligent reader, has already read those and has a basic understanding of SEO fundamentals like meta tags, titles, keywords, etc.
The foremost and most important step in achieving any kind of traffic is to produce great content. I'm sure you've heard that a million times, but let me break it down how I perceive it. Before I even started to mess with SEO for AskApache.com I began by writing articles. At that point I didn't have a clue what my blog was going to be about or even if I was going to be doing it after a week.
For me, being a top-paid professional web developer, I spend about 80% of my time doing research. I think that is a bit uncommon, but its a throwback from the 10 years I spent in the network/computer security field, where research is 99% of the job, a story for another time perhaps.
So the research I was doing at that time was about best-practice standards-based web design, mainly XHTML Strict, CSS, and unobtrusive javascript. Each of those subjects has become near and dear to my heart, and each should also be mandatory learning for anyone interested in SEO. The best advice I can give towards that end is checking out the CSS, Javascript, and XHTML Strict source code for this page and site. And of course the holy W3.org.
In addition to striving to master those 3 subjects, I was also and always will be researching web programming languages like PHP, Ajax, Ruby, and Server Technology like Apache. Although I should note that my research into Apache and server technologies is more of a hobby than a job requirement, also a throwback to my days in the security industry and of course my love for open source software.
So basically I was spending 25% of my time at work actually working, and the other 75% of the time I would research how to do something better, faster, the best. Incredibly, I discovered or re-discovered a ton of tips, tricks, and methods to aid me in my work. I was learning so much valuable information that I joined a couple of forums to discuss them and get feedback on making them even better. Soon I realized that I was one of a small few who actually post content to a forum instead of just questions, so I decided to write my tutorials down on a blog, and AskApache was born.
So that is why this blog is comprised of almost 100% tutorials, and why almost all of them are completely original works you won't find elsewhere. That's how I create content, but you might do something different. Whatever it is that you do for content, just make sure you are providing VALUE with everything you do. Not to everyone, just stuff that you would consider to have value if you were reading it.
Ok so I had 10 or so great articles that I knew would provide value for many web developers, but so what? Nobody cares you know.. That's when I decided to take a closer look at the software that was running my new blog, WordPress, and I've been hacking the code ever since on my never-ending quest to be the best and know the most advanced web development. You'll see why in a couple paragraphs.
Many sites that use a CMS of some kind, be it Drupal or WordPress, have hundreds or thousands of URL's even if they only have 10 actual posts/articles.
You've all heard this before, but almost no-one has taken it to the level I am going to discuss. Bear with me.
Removing duplicate content is actually a very straightforward process if you know what you are doing, and if you don't, well that's why I'm going to quickly explain how to really do a good job.
People misunderstand that you should just not repeat the same paragraph in a different article, that is partially true, but the main impact this has on your site is if you can access the same article from more than a single URL.
I hope you realize you MUST use pretty urls like my site and not codey looking ones with question marks. You can find any potential duplicate urls on wordpress with the rewriterules plugin. Also look at Google's webmaster tools to look for any duplicate urls, and you can use xenus link sleuth tool as well.
Once you've found duplicate urls, you need to instruct google and other search engine robots to be redirected to the correct url. By doing a 301 redirect you tell the search engines NOT to index the bad url, only the good one. Below are some of the .htaccess code I use on this site to accomplish this technique, this is gold I myself use so pay attention. It works.
First lets start with one everyone should know, and the most common, to www or not to www?
RewriteCond %{HTTP_HOST} !^www.askapache.com$ [NC] RewriteRule ^(.*)$ https://www.askapache.com/$1 [R=301,L]
Its a highly rare individual who has seen this one, which forces requests for .html/
to .html
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+.html/ HTTP/ [NC] RewriteRule ^(.*).html/$ https://www.askapache.com/$1.html [R=301,L]
For the rest of the duplicate urls that you find, I like to use Redirect and RedirectMatch.
This redirects requests that start with an & or /& or /(
to my homepage.
RedirectMatch 301 ^/&(.*)$ https://www.askapache.com/ RedirectMatch 301 ^/&(.*)$ https://www.askapache.com/ RedirectMatch 301 ^/([(]+)(.*)$ https://www.askapache.com/
This redirects requests with //whatev
to /whatev
RedirectMatch 301 ^//(.*)$ https://www.askapache.com/$1
But this is just a brief look at what you will have to spend some time on. There are detailed guides to doing this with mod_rewrite and using Redirect on my blog. Its time now for some real SEO tips. The heart of the matter, as it were.
So I realize that was brief, so I want to really stress 2 things or you won't take away much from part 1.
This is truly one of the most important things in my personal experience. I personally take this as extreme as I can, I regularly grep my access files, mod_security and error log files looking for bad URLS. I am always checking them out to see if someone has a bad link to me somewhere, or if someone just typed it in wrong. If its a bad link on a site, I will very politely attempt contacting the webmaster about it until they fix it.
Even I, with my many colorful years of Internet travel, was caught off-guard by the variety and creativity and the sheer number of urls people are using to link to my site. I found that often bad links would be published because my URL was just too long, so I shortened the URL's. Now of course bad links can't really even touch my site with all my 301's in place.
Besides grepping your server's logs, the 2nd best place to locate duplicate urls or just plain wrong urls is by using Googles free webmaster tools. They keep track of all the bad urls linking to your site and allow you to download this data in a .csv spreadsheet format. The first time I checked into this I found over 1,000 bad links, after a couple months with my RewriteRules and 301 Redirects, I've narrowed the list down to under 50 most months. That is a powerful reason to use 301 Redirects, as we'll really get into in part 2.
Finding the bad urls takes some time, a couple hours even, and then the whole reason you do that is to be able to create 301 Redirects for all of those bad urls to good urls.
One reason that I wasn't even aware of until several months ago is that when Googlebot locates a bad URL for your site, it tries to access it, and if you haven't planned for this in advance, your page most likely will return a 200 OK
status header, or if you are lucky a 404 Not Found
error, both of which really hurt you.
Basically, a 200 response will produce duplicate content in 99.9% of the time, and 404 responses will whisper to Google's algorithms that you don't know what you are doing and your site isn't authoritative. 200 means google will index your site, 404 means google won't index your site, but it also won't give up trying for awhile, which takes away from your real urls.
301 Responses were practically invented for user-agents/clients/web-crawling robots like google. They instruct the client, whether that be a persons browser or a googlebot, that the resource/page that they are looking for is actually at a different URL. This is an authoritative response that makes googlebot and other search engines ecstatic because now they can give up on the 200 and 404 responses that didn't really give them an answer either way.
On the other hand, a great 404 can and should be just as powerful as a 301, but hardly anyone uses them in the correct way according to HTTP 1.1 or 1.0 Specifications. We'll tear that subject apart further down the road.
I'll leave this topic for now with one last idea, 301 Redirects when implemented and used correctly, actually redirect the page rank and search engine ranking for itself to the redirected / correct url. That means if you have 1000 unique links pointing to your article, and all of them are incorrect in some way, if you can 301 redirect all of those bad links to your correct link you now have 1000 new good links! It has to be done right and in a classy way though of course.
Now that you have content and a great site, its time to SEO like a mofo.