404 errors

A series of technical posts about 404 errors and how to fix them for better SEO performance.

Ever click on a page in a search result listing and get a “404 – Page Not Found” error?  It probably hasn’t happened much to you since the search engines do a fairly good job of not ranking pages with 404 errors, or even sites that have “coming soon” pages.

There are a couple of common ways you as a site owner can inadvertently generate these types of pages, and you want to make sure they are not indexed in the search engines.

The first way is probably the most common – you changed the URL and forgot to redirect the old one to the new one.  So you might have changed a page from “/relevance-of-404-errors/” to “/importance-of-404-errors/”.  The problem is that without permanently redirecting the old URL, it could still be visible in the search results, leading to that “404 – Page Not Found” error.  Whoops.

The second way is when you simply remove pages from your website, not realizing the pages are still indexed in Google or other search engines. This is common with special promotional pages for marketing, or landing pages you might be temporarily using for paid search efforts.

The ideal 404 response:

Here /abc.html, /pqr.html and /xyz.html are pages that don’t exist.


There are two components to this:

1. Search Engine component: In terms of SEO and to avoid any implications of 404 errors in search engines (which we will discuss below) ensure that that when a page is requested which doesn’t exist the web server should return a ’404 not found’ status code in the header.

2. Usability component: The browser should preferably render a custom 404 page. From a user’s perspective once we reach a page which doesn’t exist there should be ways of going back to the main page; without hitting the back button.

If your domain doesn’t handle number 1 you have chances of running into issues of duplicate content. The reason: If it doesn’t return a “404 not found” it means you are giving a green signal to a search engine to index the page. And since the same page is displayed whenever anyone types a URL which doesn’t exist on your domain (theoretically infinite variations are possible) this same page is indexed under multiple non-existent URL’s. This is a duplicate content issue and the search engine could possibly put a small red flag on your site. Something you definitely  want to avoid.

The 404 myth:

The most common case is when someone thinks they have a valid 404 because they have a custom 404 page and their server is not returning a ’404 not found’.  This is misleading and a common scenario looks like this.  In this case we are giving the search engine a green signal by returning a ’200 OK’ to index /abc.html, /pqr.html, /xyz.html all for the same 404 page. This leads to the search engine indexing the 404 page (which we don’t want) for all the three URL’s : a potential duplicate content issue.

How to check for 404′s:

Run an analysis of your site (it takes 30 seconds) on our Free Website Analyzer; it identifies 404 errors among other SEO factors.

There is a really useful Firefox plugin called ‘Live HTTP Headers‘ where you can check the status code in the header to see if it’s a’404 not found’.

In today’s blog post we will discuss how to implement a proper 301 redirect on Apache or Windows servers from one domain to another. Redirects are technical and we see a lot of sites where 301 redirects are not implemented properly. You might want to do 301 redirects because of a number of reasons: redirecting the non www to www and vice versa, or if you are changing your domain or a file within the same domain.  This is also a great post on ways you can fix your 404 error pages.

Before we enter the technical details, it is important to understand the importance of a 301 redirect from non www to www version of your site (or vice versa). First having two versions of your site can create duplicate content, which may result in your website being penalized by search engines. Secondly and most importantly, when you acquire links it’s always much better to have them pointing at one version of the site versus distributing it among two pages which dilutes the search engine importance to your domain.

301 redirects is the most preferred way of handling duplicate content. Other ways include using the ” rel = canonical” tag (don’t use for cross domain, Yahoo/ Bing still don’t recognize it), blocking files in robots.txt and the meta noindex tag.

Let’s dive into the technical details:

Implementing 301 redirects for an Apache server:

Step 1: To implement a 301 redirect the file we need to work with is the .htaccess file. To access the file you need to go into your FTP and look into the document root.

Step 2: If you can’t see it, enable viewing of hidden files since the .htacess file is hidden. If there is still no .htaccess file present , create one with a simple text editor.

Step 3: Insert this code in the file:

Code example from non www to www:

RewriteEngine On

RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

Obviously you will need to replace ‘example’ with your own domain name.

Also make sure the Rewrite Engine is turned on, you will just need to turn it on once.

Step 4: Save and Test it!

Implementing 301 redirects for a Windows server:

When setting up a site in IIS, the normal process is to create one account for the site and add both www and non-www versions of the domain name to the host headers for the account. This creates a canonicalization issue;  and the site will then be available at both www and non-www URLs.

Step 1: Get access to the Windows Server navigation Panel. Navigate your way to the Internet Services Manager (Programs — Administrative Tools — Internet Services Manager).

Step 2: Create 2 accounts for the site within IIS: one with the www version of the domain in the host header and one with the non-www version of the domain. All of the site files can be placed in the preferred version and a single page in the other.

Step 3: Right click on the single page you want to redirect FROM and choose Properties. The Properties box will now appear.

Step 4: Change the redirect option to “A redirection to a URL” and type in the new URL in the box provided.

Step 5: Be sure to check the box marked “A permanent redirection for this resource”. If you leave this box unchecked, you will create a 302 (temporary) redirect, which is not permanent or beneficial from an SEO standpoint in this situation.

Step 6: Test it!

Update July 6, 2011 – Doing a www redirect for Front Page

As I see some of the comments below pertaining to Front Page, it was a matter of time before I had to do one for this God-forsaken MS product myself.  Here’s how I did after some trial and error:

1.  First, you have to identify weather you are running Linux or Windows.  This works for Linux.  Apparently, there is an extension called FollowSymlinks which needs to be turned on, as well as Mod Rewrites, so call your host provider for that one.

2.  FP uses several .htaccess files – one in the main directory structure, and 3 other .htaccess files called “super files”.  You will find these other .htaccess files here:

/_vt_bin/.htaccess

/_vt_bin/_vti_aut/.htaccess

/_vt_bin/_vti_adm/.htaccess

3.  Make sure this is at the top of all 4 .htaccess files: “Options +FollowSymlinks” underneath “# -FrontPage-”

4.  Underneath this, add your 301 redirect command:

RewriteEngine On RewriteCond %{HTTP_HOST} ^yoursite\.com$ [NC] RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=301,L]

Here, I did a 301 from non-www to the www, because for SEO purposes, most people have more inbound links pointing to the www version.

That’s is – this should work!