SEO Tip Tuesday

This is a lengthy, instructional post outlining typical SEO problems and solutions with sites that have gone through several re-designs. OK, now that I’ve got your attention, let’s get down to business.  If your site is more than a few years old, chances are it’s been through a fair share of changes.  You’ve probably had three different designs, multiple programmers or web shops working on it.  You probably went from PHP to .NET, and back to PHP again, and you added a blog from WordPress and migrated all of the static content there too.  You added a boatload of pages, and got rid of just as many after marketing went through their nineteenth change in messaging.  Sound familiar?  Right – that’s why your site is…well….not showing up in Google as frequently as it should be.

Read the rest of this entry »

I thought we would do something different for this week’s SEO Tip Tuesday. As an Audi lover myself (on my 2nd A4 Turbo), I frequently search the new models so I can mostly drool and go through the whole “I wish I had..” sentiments. And I find myself at times in a bit of a love triangle between the Audi RS4 and the BMW M5, albeit different classes. So I search a lot for various BMW products, and I love watching M5board.com and seeing the M5 take on anything from an RS4 to a Porsche C4S Turbo.

Read the rest of this entry »

Ever click on a page in a search result listing and get a “404 – Page Not Found” error?  It probably hasn’t happened much to you since the search engines do a fairly good job of not ranking pages with 404 errors, or even sites that have “coming soon” pages.

There are a couple of common ways you as a site owner can inadvertently generate these types of pages, and you want to make sure they are not indexed in the search engines.

The first way is probably the most common – you changed the URL and forgot to redirect the old one to the new one.  So you might have changed a page from “/relevance-of-404-errors/” to “/importance-of-404-errors/”.  The problem is that without permanently redirecting the old URL, it could still be visible in the search results, leading to that “404 – Page Not Found” error.  Whoops.

The second way is when you simply remove pages from your website, not realizing the pages are still indexed in Google or other search engines. This is common with special promotional pages for marketing, or landing pages you might be temporarily using for paid search efforts.

The ideal 404 response:

Here /abc.html, /pqr.html and /xyz.html are pages that don’t exist.


There are two components to this:

1. Search Engine component: In terms of SEO and to avoid any implications of 404 errors in search engines (which we will discuss below) ensure that that when a page is requested which doesn’t exist the web server should return a ’404 not found’ status code in the header.

2. Usability component: The browser should preferably render a custom 404 page. From a user’s perspective once we reach a page which doesn’t exist there should be ways of going back to the main page; without hitting the back button.

If your domain doesn’t handle number 1 you have chances of running into issues of duplicate content. The reason: If it doesn’t return a “404 not found” it means you are giving a green signal to a search engine to index the page. And since the same page is displayed whenever anyone types a URL which doesn’t exist on your domain (theoretically infinite variations are possible) this same page is indexed under multiple non-existent URL’s. This is a duplicate content issue and the search engine could possibly put a small red flag on your site. Something you definitely  want to avoid.

The 404 myth:

The most common case is when someone thinks they have a valid 404 because they have a custom 404 page and their server is not returning a ’404 not found’.  This is misleading and a common scenario looks like this.  In this case we are giving the search engine a green signal by returning a ’200 OK’ to index /abc.html, /pqr.html, /xyz.html all for the same 404 page. This leads to the search engine indexing the 404 page (which we don’t want) for all the three URL’s : a potential duplicate content issue.

How to check for 404′s:

Run an analysis of your site (it takes 30 seconds) on our Free Website Analyzer; it identifies 404 errors among other SEO factors.

There is a really useful Firefox plugin called ‘Live HTTP Headers‘ where you can check the status code in the header to see if it’s a’404 not found’.

I was browsing through the SEO Expert group on LinkedIn yesterday, and came across this excellent infographic posted by Jim Rudnick of CanuckSEO.com.  The graphic actually is credited to Elliance.com, and shows the steps of the process of how a search engine will determine the original source of the content.  The original source of the graphic can be found here.  You can also read about why Google cares about duplicate content.

How a Search Engine Determines Duplicate Content

Keyword relevancy refers to how relevant, or important, certain keywords or phrases are to each page of your website.  Search engines use keyword relevancy to determine what your page is about, and that is in part what they will use to determine what keywords you will rank for when doing a search.

When optimizing your website for keyword relevancy, it is usually best to only target a few keywords on an individual page basis.  That means that your home page should target different words or phrases than an interior product or service page.

If you try to target too many words, the search engine will have a difficult time trying to determine what the page is about.  If you don’t properly target the right words based on the content of the page, you could be missing out on ranking opportunities.

When starting out, try to target words that get lower to moderate search volumes.  There are several tools which you can use for keyword research in finding the proper targets.

When optimizing your pages for keyword relevancy, it is usually not necessary to over-emphasize your company name, or sometimes product names since you can usually rank for those fairly easily.

It is important where the keyword is on the page when optimizing for keyword relevancy.  Here are the places that SiteJuice looks for the word in order to score your keyword based on relevancy:

  1. URL
  2. H1 tag
  3. Meta description
  4. Title tag
  5. Body content
  6. Bold or italicized
  7. Alt tags or image filenames

If we find the keywords you are trying to rank for in these areas, you will receive a higher relevancy score, and increase your chances of ranking for the word(s) in a search result.  This is one of the ways we perform what’s called “on-page optimization” in SEO terminology.

It’s easy to find the pages indexed in Google for your website.  Simply go to the search bar and type the following query:

site:”www.yoursite.com”

In the upper left corner of the results page you should see “About [some number] results”.  You should go through these pages and identify which ones the search engine does not have listed.  This may be due to a duplicate content issue, or due to the fact that the search engine may not see them as important enough.  If you have page missing from the index,p make sure they have pages linking to them, even if they are internal links for now.  You should also make sure none of the pages lead to a 404 error.

The same query should work in most major search engines.

There are several types of inbound links a website can get, each with a different value associated with them.  Here we will explore a few of them:

Contextual

A contextual inbound link is usually a one-way link that appears to be the most natural, and therefore has the most value.  This is a link pointing to you within the body text of another website, and links to you with descriptive text.  This could be a blog post, news article or other.

Sidebar

A sidebar link is one that appears in the sidebar of a page, and is usually replicated through many or every page of the site.  Every modern blog has a sidebar, and is usually filled with links and resources to other sites.  Some of these can provide value such as a descriptive text link, versus an image.

Footer

A footer link is one that appears in every page way down at the bottom of the page.  It usually contains credits to the companies that design, build and host a website.  These are considered to be lower in value than a contextual or sidebar link.

Reciprocal

Reciprocal links are websites that exchange links with each other, usually on a “Links” page or similar.  These have been heavily devalued over the years, as search engines started to see a trend of massive link building with sites that were not at all related.  Some reciprocal links are good if they come from authoritative sources, the content is related, and are not the majority type of links pointing to the site.  Read our other blog post on why reciprocal links are bad.

Duplicate content is one of the most common issues that can plague websites from not being found in search engines, and most wouldn’t even know it was happening.

There are a few types of duplicate content. The problem with each of these scenarios is that either a) the search engine can not figure out which page to rank (they do NOT want multiple sites on the first page of results linking to the exact same content), so they usually rank none of them; or b) the search engines think you are spamming them. Either way, you don’t want to have these plaguing your results, and they are fixable.

Canonical URL
The first is the lesser of the infractions. Most websites will have a “www” version, and a non-www version. Technically, the search engines see this as two different websites; i.e. www.yoursite.com and yoursite.com. The version that is your preferred domain is called the “canonical URL”, and the other version should redirect to the preferred version using a 301 permanent redirect. You can identify the canonical URL to the search engine by using a “rel=canonical” tag in the header of the primary page.

Multiple Domains/Countries
Some businesses go with the strategy of having multiple domain names to compete in the search results. For example, they may have “www.redshoes.com” and “www.maroonshoes.com”, which is fine – except that both sites are exactly the same. Word for word, page for page, the content is exactly the same. Instead, choose the domain with the most authority (site age, higher Google PageRank, more inbound links, etc.) and redirect the lower authority one to the higher authority one.

Other companies with locations around the world will have their website replicated on multiple country extensions, such as .com, .co.uk, and a .ca. This can be a good strategy, but do NOT copy the content verbatim from one site to the other. To do this properly you will need to first repurpose the content to fit that country (such as using “colour” on a UK site), as well aaa getting links from sites or directories that originate in that country. Hosting the site in that country will help too. Doing all of this will hep you rank higher locally in that region.

Update 6/29/10 – new information suggests that Google is NOT penalizing corporate sites that have similar, or perhaps even the same content on multiple country-specific domains. The search engines realize these are not intentionally duplicate content sites; however, the country version should still have language appropriate for that region of the world.

Marketing/Affiliate pages
Sometimes we see websites with multiple instances of the homepage for affiliate marketing or tracking purposes. They may have something like:

www.site.com/
www.site.com/index.php/aff/1234
www.site.com/index.php/aff/5678

All of these pages are exactly the same, except only one of them is the preferred; the canonical URL. All others should have a NOINDEX meta tag to prevent the search engine from indexing the page and then getting confused later.

Think you have a duplicate content problem? Ask us and we can check it out for you.

Reciprocal links used to be a good way to build links to your site. You would share links with another website on a “links” page or similar. Now, reciprocal links are worth much less due to link spamming. SEO’s would amass large amounts of links quickly by creating link farms, which are hundreds or thousands of webpages controlled by them. The majority of the links would be from “spammy” looking sites that had nothing to do with the target site. Search engines quickly caught on and put far less value on these types of links. This is why you hear “having a handful of quality links is better than having tons of unrelated links”.

SEO’s answered the change by creating “one way reciprocal” links, where they would triangulate links from various sites so they would not appear to all be linked to one another. In other words, site A would link to site B, but instead of site B linking back to site A, site B would link to site C, and site C would link back to site A. They’ve even gone so far as to spread the hosting of these sites far and wide to further hide their footprint, making it a complex scheme.

None of these are good ideas and we never recommend them. The best way to build links to your site is slowly and organically. This is why it takes so long.

Alternatively, it is OK to link to partners, clients and the authoritative sites if the content is related.

If you have uncovered pages in Google’s index for your site that should be removed, there is a simple 2 step process that you will need to follow.

Just deleting the page from your website will NOT remove the age from the index. In fact, this can cause worse problems.

Step 1: place the NOINDEX meta tag on each page you want to delete from the index. Place this tag in the header near your other meta tags. If Google crawls your site often, they may pick up the change, but it could take weeks before they actually remove it from all of their servers.

If you need to remove the page immediately, go to step 2.

Step 2: log into your Google Webmaster Tools account. Go to the Crawl menu and select the “remove a page from the index” link. Simply add the page you wish to delete. They will check for the NOINDEX meta tag, so do not do this step until you have completed step 1. This manual process should speed up the process to days instead of weeks.

Simply deleting the page from your website will not solve the problem. Google will still have this page indexed, and if a person finds it in a search and attempts to click on it, they will get a “404 – page not found” error. Not only is this poor usability, but Google will not readily rank sites riddled with lots of 404 errors.

Using robots.txt
You can also tell Google not to crawl or index pages on your site by using your robots.txt file to tell the search engine what to crawl, and what not to crawl. This is especially useful if you have entire directories of pages that you need to handle.