duplicate content

This section contains lots of information about duplicate content, why duplicate content is important in search engine optimization, and how to fix some of the issues you might be having.

When you’re setting up your WordPress site for SEO, or search engine optimization, there are a few plugins you will want to install.  One of them is going to be either the All in One SEO Pack, or the Platinum SEO Pack.  For quite some time, we were installing the All in One SEO Pack, but then found several advantages to the Platinum SEO Pack that made us switch over.

Both plugins are relatively easy to install – the usual click of a button with most WordPress plugins.  However, both require some SEO knowledge before you start clicking on all of the endless settings.  Let’s go through some of those:

For the All in One SEO Pack, you start by configuring what you want your homepage tags to have.  It will become the defacto tags throughout the site if you don’t modify them on your individual pages and posts, or set the option to “rewrite titles,” which is generally set to %post_title% | %blog_title%.  All of this is nearly exactly the same for the Platinum SEO Pack, except for one thing: the option to “automatically do 301 redirects for permalink changes”.  This is a HUGE deal because most site owners are going in and changing page names (i.e. URL’s) and fail to realize they created a bunch of duplicate content by having multiple versions of the same page indexed in Google.

Both the All in One SEO Pack and the Platinum SEO Pack have the canonical URL option, which allows you to tell the search engine what your preferred version of the URL should be.  While rel=canonical works well, you still want to “noindex” a lot of the duplicate pages WordPress will produce from a single post.  For example, if you create a single post with one tag that is only assigned to this post, and in one category that is only assigned to this post, you will inadvertently create several duplicate pages:

www.site.com/blog/the-example-post/

www.site.com/blog/the-tag-you-used-for-the-post/

www.site.com/blog/the-category-you-used-for-the-post/

plus the comments and RSS feed pages that could also spawn off of this one post.  What the Platinum SEO Pack offers in terms of options to thwart this kind of activity are much more controls for noindexing that will be useful to the 95% of websites who don’t post often, or categorize and tag properly.  For instance, I can easily no index categories (unless you have an SEO strategy for these pages), date based archives, tags, comment pages, RSS feed pages, search results pages, sub pages and author archives.  All In One SEO Pack only lets me noindex tags, archives and categories.

Also, have you ever wondered where Google is getting the meta description or title tag for your website, and it’s not one you’ve written, or you changed it?  Sometimes Google pulls those from the ODP (Open Directory Project, or DMOZ), or from the Yahoo Directory.  In the Platinum SEO Pack, I can choose to add a “noodp” or “noydir” meta tag, which tells Google not to get my meta data from there, and to use what’s on the site instead (note however, Google still may override you and create their own meta description tag if they don’t like yours :) – I can’t do this in the All in One SEO Pack.

The rest of the options are generally the same, but keep in mind you can’t have both of these plugins installed because they will not work properly.  You will have to uninstall one of them.

WordPress, out of the box, unfortunately is not that SEO friendly.  With the help of some plugins and proper configuration, you can make it probably the most SEO-friendly “CMS” out there.  So we wanted to point out a couple very common issues in WordPress that could wreck your prospects of SEO domination.

Read the rest of this entry »

This is a lengthy, instructional post outlining typical SEO problems and solutions with sites that have gone through several re-designs. OK, now that I’ve got your attention, let’s get down to business.  If your site is more than a few years old, chances are it’s been through a fair share of changes.  You’ve probably had three different designs, multiple programmers or web shops working on it.  You probably went from PHP to .NET, and back to PHP again, and you added a blog from WordPress and migrated all of the static content there too.  You added a boatload of pages, and got rid of just as many after marketing went through their nineteenth change in messaging.  Sound familiar?  Right – that’s why your site is…well….not showing up in Google as frequently as it should be.

Read the rest of this entry »

I was browsing through the SEO Expert group on LinkedIn yesterday, and came across this excellent infographic posted by Jim Rudnick of CanuckSEO.com.  The graphic actually is credited to Elliance.com, and shows the steps of the process of how a search engine will determine the original source of the content.  The original source of the graphic can be found here.  You can also read about why Google cares about duplicate content.

How a Search Engine Determines Duplicate Content

It’s easy to find the pages indexed in Google for your website.  Simply go to the search bar and type the following query:

site:”www.yoursite.com”

In the upper left corner of the results page you should see “About [some number] results”.  You should go through these pages and identify which ones the search engine does not have listed.  This may be due to a duplicate content issue, or due to the fact that the search engine may not see them as important enough.  If you have page missing from the index,p make sure they have pages linking to them, even if they are internal links for now.  You should also make sure none of the pages lead to a 404 error.

The same query should work in most major search engines.

Duplicate content is one of the most common issues that can plague websites from not being found in search engines, and most wouldn’t even know it was happening.

There are a few types of duplicate content. The problem with each of these scenarios is that either a) the search engine can not figure out which page to rank (they do NOT want multiple sites on the first page of results linking to the exact same content), so they usually rank none of them; or b) the search engines think you are spamming them. Either way, you don’t want to have these plaguing your results, and they are fixable.

Canonical URL
The first is the lesser of the infractions. Most websites will have a “www” version, and a non-www version. Technically, the search engines see this as two different websites; i.e. www.yoursite.com and yoursite.com. The version that is your preferred domain is called the “canonical URL”, and the other version should redirect to the preferred version using a 301 permanent redirect. You can identify the canonical URL to the search engine by using a “rel=canonical” tag in the header of the primary page.

Multiple Domains/Countries
Some businesses go with the strategy of having multiple domain names to compete in the search results. For example, they may have “www.redshoes.com” and “www.maroonshoes.com”, which is fine – except that both sites are exactly the same. Word for word, page for page, the content is exactly the same. Instead, choose the domain with the most authority (site age, higher Google PageRank, more inbound links, etc.) and redirect the lower authority one to the higher authority one.

Other companies with locations around the world will have their website replicated on multiple country extensions, such as .com, .co.uk, and a .ca. This can be a good strategy, but do NOT copy the content verbatim from one site to the other. To do this properly you will need to first repurpose the content to fit that country (such as using “colour” on a UK site), as well aaa getting links from sites or directories that originate in that country. Hosting the site in that country will help too. Doing all of this will hep you rank higher locally in that region.

Update 6/29/10 – new information suggests that Google is NOT penalizing corporate sites that have similar, or perhaps even the same content on multiple country-specific domains. The search engines realize these are not intentionally duplicate content sites; however, the country version should still have language appropriate for that region of the world.

Marketing/Affiliate pages
Sometimes we see websites with multiple instances of the homepage for affiliate marketing or tracking purposes. They may have something like:

www.site.com/
www.site.com/index.php/aff/1234
www.site.com/index.php/aff/5678

All of these pages are exactly the same, except only one of them is the preferred; the canonical URL. All others should have a NOINDEX meta tag to prevent the search engine from indexing the page and then getting confused later.

Think you have a duplicate content problem? Ask us and we can check it out for you.

In today’s blog post we will discuss how to implement a proper 301 redirect on Apache or Windows servers from one domain to another. Redirects are technical and we see a lot of sites where 301 redirects are not implemented properly. You might want to do 301 redirects because of a number of reasons: redirecting the non www to www and vice versa, or if you are changing your domain or a file within the same domain.  This is also a great post on ways you can fix your 404 error pages.

Before we enter the technical details, it is important to understand the importance of a 301 redirect from non www to www version of your site (or vice versa). First having two versions of your site can create duplicate content, which may result in your website being penalized by search engines. Secondly and most importantly, when you acquire links it’s always much better to have them pointing at one version of the site versus distributing it among two pages which dilutes the search engine importance to your domain.

301 redirects is the most preferred way of handling duplicate content. Other ways include using the ” rel = canonical” tag (don’t use for cross domain, Yahoo/ Bing still don’t recognize it), blocking files in robots.txt and the meta noindex tag.

Let’s dive into the technical details:

Implementing 301 redirects for an Apache server:

Step 1: To implement a 301 redirect the file we need to work with is the .htaccess file. To access the file you need to go into your FTP and look into the document root.

Step 2: If you can’t see it, enable viewing of hidden files since the .htacess file is hidden. If there is still no .htaccess file present , create one with a simple text editor.

Step 3: Insert this code in the file:

Code example from non www to www:

RewriteEngine On

RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

Obviously you will need to replace ‘example’ with your own domain name.

Also make sure the Rewrite Engine is turned on, you will just need to turn it on once.

Step 4: Save and Test it!

Implementing 301 redirects for a Windows server:

When setting up a site in IIS, the normal process is to create one account for the site and add both www and non-www versions of the domain name to the host headers for the account. This creates a canonicalization issue;  and the site will then be available at both www and non-www URLs.

Step 1: Get access to the Windows Server navigation Panel. Navigate your way to the Internet Services Manager (Programs — Administrative Tools — Internet Services Manager).

Step 2: Create 2 accounts for the site within IIS: one with the www version of the domain in the host header and one with the non-www version of the domain. All of the site files can be placed in the preferred version and a single page in the other.

Step 3: Right click on the single page you want to redirect FROM and choose Properties. The Properties box will now appear.

Step 4: Change the redirect option to “A redirection to a URL” and type in the new URL in the box provided.

Step 5: Be sure to check the box marked “A permanent redirection for this resource”. If you leave this box unchecked, you will create a 302 (temporary) redirect, which is not permanent or beneficial from an SEO standpoint in this situation.

Step 6: Test it!

Update July 6, 2011 – Doing a www redirect for Front Page

As I see some of the comments below pertaining to Front Page, it was a matter of time before I had to do one for this God-forsaken MS product myself.  Here’s how I did after some trial and error:

1.  First, you have to identify weather you are running Linux or Windows.  This works for Linux.  Apparently, there is an extension called FollowSymlinks which needs to be turned on, as well as Mod Rewrites, so call your host provider for that one.

2.  FP uses several .htaccess files – one in the main directory structure, and 3 other .htaccess files called “super files”.  You will find these other .htaccess files here:

/_vt_bin/.htaccess

/_vt_bin/_vti_aut/.htaccess

/_vt_bin/_vti_adm/.htaccess

3.  Make sure this is at the top of all 4 .htaccess files: “Options +FollowSymlinks” underneath “# -FrontPage-”

4.  Underneath this, add your 301 redirect command:

RewriteEngine On RewriteCond %{HTTP_HOST} ^yoursite\.com$ [NC] RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=301,L]

Here, I did a 301 from non-www to the www, because for SEO purposes, most people have more inbound links pointing to the www version.

That’s is – this should work!