In part I: SEO Secrets of AskApache.com we talked about content and building a website to be your "SEO Base". This article discusses some advanced SEO concepts to get a site indexed, move your best pages higher in search results, and controlling the pagerank/seo-juice of your site. but relatively easy ways to control and tweak WHAT urls on your site are indexed, and HOW. I've heard some people refer to this as "controlling pagerank flow" and "controlling pagerank juice", basically we want our best pages to rank higher in the search engine results.
Big Picture: Going from no website to AskApache.com in less than a year can be accomplished by anyone with unique content and a resolve to avoid any shortcuts and take it one step at a time.
We want what Google wants, to provide the most relevant content for someone who is doing a search. Basically, you want every url on your site that has unique content to be included in the index. In other words, you have to think like a search engine!
Here's what I mean, Google approaches search with the overwhelming goal of bringing the content to a searcher that is the most-likely to be what that searcher is searching for. Another way of looking at it is something I read on Google...
Google's goal is to get you off of their site as fast as possible by providing you with exactly what you are looking for.
If you search for htaccess tutorial for seo on Google, would you be more likely to visit a tutorial about using .htaccess for seo or a category page for htaccess articles? AskApache.com has both of those urls included in the index, but the article ranks higher than the category page, as it very well should.
Before I explain how I am able to help Google and other search engines rank my article pages higher than my category pages, we need to get the urls in the index or nothing will show up. There are many well-discussed methods for getting included in the index, so I'll just list a few that I use.
Here's how to find out which of your pages are indexed.
Yo homeslice! I didn't mean break dance.. I mean lets simplify AskApache in the context of getting our urls indexed high/low. Here's the stats: 1 Homepage, 206 Articles, 19 Pages, 31 Categories
This page is generally the highest ranking page in the index, it should contain links to your best urls, and provide easy navigation
These are the article's (like this one) of AskApache.com and are the main source of search engine traffic. You want each url (if its a good article) to be ranked as high as possible. Some keys are to really make each article specific to a topic by using best-practice (X)HTML.
Most of these are pages like the online-tools hosted on this site, or other basic pages like about, contact us, site-map, etc.. Some of these you may want to rank very high ( like the /about/ page ) and some you may not want to even be included in the index.
These are tricky because they are generally just lists of articles from each category, which isn't specific enough to get much seach-engine-traffic, but is very useful to site visitors. I beefed up my category pages by adding additional information about the category topic in addition to excerpts of the articles.
So Googlebot and other search engine robots have these crazy complicated algorithms (many patented) that SEO Industry types may get caught up in and try to technically analyze them. I'm sure you've seen/read/heard the complicated advice that will always be pushed by many... advice like:
Now if you've had success with that then props to you, success is success, but I personally choose to completely ignore all that. The number 1 thing that the top search engines advise is to design your page for a Human Visitor, not a computer. The golden rule for me is how I would rank the page, not how some algorithm would.
This is a major factor in your site being at the top vs. nowhere. You design your HTML to be as minimal as possible (see source code for my homepage) and contain ONLY the neccessary elements. Above all, use semantically sound XHTML markup. (view source of W3C)
Get your javascript and CSS out of your HTML and use external files (like this site) ALWAYS! You should start with just the HTML, no css, no colors, no javascript, and THEN you add the .css and then you add the javascript.
Say your browser didn't have a mouse, didn't support images, css, javascript, or even colors! Your HTML should be structured such that your page is still easily readable and easy to navigate. You can use lynx, links, and many other terminal-based browsers to test for this... please see the Web Accessibility Initiative (WAI) for detailed info.
Web accessibility refers to the practice of making websites usable by people of all abilities and disabilities. When sites are correctly designed, developed and edited, all users can have equal access to information and functionality. For example, when a site is coded with semantically meaningful HTML, with textual equivalents provided for images and with links named meaningfully, this helps blind users using text-to-speech software and/or text-to-Braille hardware.
A few tools and techniques are available for controlling the "juice" or "pagerank" of your urls.
I've done quite a bit of research and experimentation with robots.txt files, which is a file located in the root of your website at https://www.askapache.com/robots.txt that is downloaded by all legitimate search engine spiders/bots and used as a Blacklist to prevent certain urls from being indexed. Here are a few of the articles on this site, which you may skip if you like as they don't illustrate the big-picture that I am going to discuss now.
Even though robots.txt files are for whitelisting and blacklisting urls, I have found that they should only be used as an extreme form of blacklisting. When you Disallow a url in your robots.txt file, that means most search engine bots won't even LOOK at the url. As you can see in the below example, I only disallow urls that shouldn't ever be LOOKED at. The real powertool is the robots meta tag.
User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content Sitemap: https://www.askapache.com/sitemap.xml
Ok I'm really trying to simplify, because what you should understand is the big-picture. Every page can have a robots meta tag in the header, and this robots meta tag can tell the search-engine to index/not-index AND follow/not-follow. Here are some examples:
index
means the search engine is free to index, archive, cache, and follow the page whereas noindex
means DO NOT include this page in the search engine results.
follow
means the search engine is free to LOOK at the page and follow the links on the page whereas nofollow
means DO NOT follow the links on the page.
Just add this to any plugin file and it will add the right robots meta tag to your site.. tweak to taste.
'; if ( is_paged() || is_search() || is_404() || is_author() || is_tag() ) $robot = ''; elseif ( is_home() || is_front_page() || is_single() ) $robot = ''; elseif ( is_category() || is_page() ) $robot = ''; echo $robot . "n"; } add_action( 'wp_head', 'askapache_robots_header' );
External and Internal Links are the crux of SEO. It's important to start FIRST on your Internal Links and linking structure... Once you are satisfied that the correct pages are indexed and ranked appropriately, then you can begin to look at external links.
rel
, title
, alt
attribute semantically is very helpful. (next, prev, index)rel="nofollow"
to links that you dont want followed.The web has gotten to be so full of malicious/non-helpful SEO activity that I recommend developing your content NOT external links. If you want to do this right and provide great content that makes search engine users happy and makes the web better, then explore this blog and develop content until the next article in this series, where I'll show you how to make your site explode.
Stay tuned for Part III, which will dive deeper into the pipeworks of AskApache.com