Let us say that you have the domain name mywonderfulsite.com. If you set up a web site at that domain, what do you suppose its URL is?
Is it-- http://mywonderfulsite.com --or is it-- http://www.mywonderfulsite.com ?
The chances are 99 out of a hundred that your host will have set things up so that a browser--or a search-engine robot--calling either of those addresses will end up on your site's front page. (If that is not the case--if either form fails to reach your front page, and you should test this--then ask your host that things be changed so that both forms will work.)
But that is their doing you a favor--and a dubious one, in SEO terms. To effect that apparent interchangeability, your host has effectively "mapped" one URL to another. And here is the problem (and you should have this tattooed to the inside of your eyelids:
| To a search engine, those are two different pages. |
And that one is "mapped" to another doesn't necessarily (and probably doesn't at all) affect that.
And the same is true for all the pages of your site:
|
To a search engine,
http://mywonderfulsite.com/dirname/pagename.html --is not the same page as-- http://www.mywonderfulsite.com/dirname/pagename.html |
Now it doesn't really matter which form you think of as the "correct" one for your site: some people like it always with the www, because many people wrongly think that's how a web address has to start and there's no point confusing those folks, while others like it always without the www, because it looks leaner and cleaner. That's 100% your choice.
But it is a choice you need to make and to enforce.
Why? Because when other sites link to you, their links are "counted" by search engines as a link to one of these page-URL forms or the other--as the linker chose--but not to both. If, say, one-quarter of your backlinks come as links to the "deprecated" form, that's 25% of your hard-sought, hard-won backlink strength squandered. In a theoretical worst case, the net value of all your backlinks as a whole could be cut right in half.
How do you "enforce" your choice of preferred URL form? As to how other people will use it, in links and otherwise, you cannot. You can have no control, even by pleadings and emails and how you list yourself, over how other people will refer to your pages, and--critical to us for SEO purposes--how they will link to your pages. No matter what you do, it is a virtual certainty that--whichever form you have adopted as your "preferred" form--at least some other sites will backlink to you with the "wrong" form of URL. And that will split up, and thus water down, your incoming-links strength, which will in turn have a negative effect on both your page rank and your SERPs.
All is not lost; in fact, nothing whatever need be lost. You just need to deal with the possible variations where you can control things: on your server.
Your ability to do such controlling is easiest on a server running Apache server software, which the majority, but by no means all, of hosts run. If, in particular, your site is hosted on Micro$oft's IIS, you will have to work with your host to find out how to do "redirection", which is what I'm about to explain. So any "how-to" that follows pertains to you only if you are hosted on an Apache-powered server.
The Hyper Text Transmission Protocol ("HTTP") standards provide a set, defined series of numeric codes by which servers and "clients" (browsers or searchbots) communicate with each other about requesting and delivering files. (You have probably many times encountered the 404 Document Not Found code.) The tool we need is the 301 Permanently Moved code. When a server sends that code back to a client that has made a request, it also sends a URL; the client is obliged, by the HTTP standard, to interpret that as meaning "the document (page) that used to be at the address you used has permanently changed its address to this", followed by the new URL.
Browsers all respond to that code: they silently re-request the page from the new URL, and perhaps keep a record of the change in their cache. Search engines also take note of the change, and from then on consider a link to the "old" address (in our case, the "wrong-form" address) to be in all ways equivalent to a link to the new one; the "mana" (PR) from ill-addressed link is sent on to its intended destination.
There are two important caveats:
1. It can take a while for search engines to update their records based on receiving 301 Redirects; how long will vary with the search engine and, doubtless, with how often your site gets crawled by their robots--which in turn usually depends on your site's "importance" to that engine.
2. It has recently come to light that the Yahoo search engine not only has, as a matter of policy, determined not to follow 301 Redirects, they may even penalize them. This is managerial stupidity of such a colossal magnitude that trying to describe it beggars the tongue. Whether that will last, or whether Yahoo will take some grown-ups into their management, remains to be seen. Fortunately for SEO purposes, Yahoo has, for right now anyway, all the significance of a fruit fly to the getting of site traffic from search engines.
What we need to do, then, is find a way to make our host server respond to requests for any of our pages that are in the "deprecated" site-URL form with a 301-Redirect response that points to our "preferred" (or "canonical") form for the URL. Fortunately, on Apache-powered servers, that is easy to do.
Apache-powered servers normally allow users, even those with "virtual domains" (meaning you are one of many sites hosted on the same physical server), access to a special control file named .htaccess--yes, that's right, a filespec that is essentially an extension without a name (such files are common in the *nix world, and are collectively called "dot files").
Your .htaccess file allows you to accomplish many wondrous things for your sites; here is a useful .htaccess-file Tutorial, and here the actual Apache documentation on .htaccess files. But for right now, we'll focus solely on using it to implement domain-name redirects.
Server software has many options, settings that the host who controls the server makes in one or many settings/initializations files, and which control exactly how the server handles many things and situations. In the Apache package, there is provision for the host to "delegate" some of those settings to individual users, who can set them locally for their own directories. Which exact settings a given host will allow to be controlled locally by users will vary from server to server, but virtually all hosts allow "redirection" to be controlled by users.
The Apache system uses a cascade of files all named .htaccess to determine how the allowed functions are applied to a given web page in your site. I say a "cascade" because it is just that: a prioritized chain, in which each lower level can override the higher levels (unless forbidden by them). Apache .htaccess files are applied by Apache directory by directory from the server root right down through your site's root, and on to all your individual subdirectories. At every directory level, Apache looks for an .htaccess file, and, if it finds one, adds its contents to the bottom of what it has already acquired from any .htaccess files in directories above that one (so that directives in a lower-level file are read after those in higher-level ones, which is how they can override them).
For redirection, the only .htaccess trick we will discuss here, you want the relevant .htaccess directives in the .htaccess file in your site's root directory (which we can define as the directory in which your site's front page--typically index.html or something much like--resides).
Even sticking to sheer redirection, there are numerous clever variations available to meet various particular needs, such as a renamed directory, or a general change in page-file extensions from .html to .shtml; but here, we will stick to the narrow issue of making URLs with and without a leading www. point to the same pages.
You want to be careful when dealing with your .htaccess file: it is powerful, and you don't want to screw it up in any way. It is even more than usually imperative when dealing with this file to keep generational backups, in case you have to back out some change.
Note! If you use the Microsoft product "Front Page" in its "publish to server" mode, you will not be able to use your .htaccess for redirection! To avoid that major loss of functionality, just "publish" to your local directories, then upload the results via FTP.
In general, you download the existing .htaccess file in your root directory, make a thoughtfully named generational backup, edit-modify the actual file, then upload the modified file back to your root directory. If you are absolutely, positively 100% sure that you do not at present have an .htaccess (and that your server is running Apache software), you just create a new .htaccess file and upload that.
When you examine your current .htaccess file, you need to see first if there is any redirection already going on. (If you did 100% of the design of your web, of course you'll already know, but that is not always the case.) Look in the file for a line containing this:
RewriteEngine On
If there is such a line, you're already doing some redirecting; if there is not, you are not. If you are already doing redirection, you need to insert the core redirect lines below within the redirect-directives block; if you are not already doing redirection, you need to insert the entire appropriate redirect block below (you can just add it to the end of the .htaccess file).
Make sure you use the correct block of the two about to be presented. You need to choose, right now, whether you want your "preferred" URL form to be with or without a www. leader--and be certain of your choice, because you sure as shootin' don't want to be changing it to the other way round at some later time!
I will first explain what to do if you are not already doing some redirection in your .htaccess file, then cover the other case.
If you want all URL calls of the form--
http://mywonderfulsite.com/directory-name/file-name.html
--to be permanently redirected to:
http://www.mywonderfulsite.com/directory-name/file-name.html
then your .htaccess file needs to contain the block below:
# =============================================================
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
# -----------------------------------------------------------
RewriteCond %{HTTP_HOST} !^www\.mywonderfulsite\.com [NC]
RewriteRule ^(.*) http://www.mywonderfulsite.com/$1 [L,R=301]
# -----------------------------------------------------------
</IfModule>
# =============================================================
|
If you want all URL calls of the form--
http://www.mywonderfulsite.com/directory-name/file-name.html
--to be permanently redirected to:
http://mywonderfulsite.com/directory-name/file-name.html
then your .htaccess file needs to contain this block:
# =============================================================
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
# -----------------------------------------------------------
RewriteCond %{HTTP_HOST} !^mywonderfulsite\.com [NC]
RewriteRule ^(.*) http://mywonderfulsite.com/$1 [L,R=301]
# -----------------------------------------------------------
</IfModule>
# =============================================================
|
So this doesn't seem entirely like black magic, let's look at what's going on in those blocks. First off, all lines beginning with a pound sign ("hash mark") # are comment lines, inserted only for clarity, and you can change them or leave them out altogether if you prefer.
The paired statements in braces (<>) are conditional: they test to see if in fact the so-called mod_rewrite Apache module--the thing that enables redirection--is available; if it is, the directives between that open/close pair will be executed (but will not be attempted in the wildly unlikely case that mod_rewrite is not available on your Apache-powered server).
The rewrite "engine", even when available, is not "on" by default: you need to explicitly turn it on, and that's what the first directive, RewriteEngine On, does.
To do redirects, the engine needs to know what your "base" directory is. From a URL standpoint, which is the standpoint we occupy here, that's your site's root directory, designated by a simple / slash; but your site's root directory is not the same thing as the actual directory name on the server, which is probably something like /usr/home/yourid/public_html, so you need to explicitly tell the rewrite engine that your root--not the server's root--is the "base" for all URL rewriting, and that's what the second directive, the RewriteBase /, does.
Now we're ready to do actual redirecting. The next two lines act together: the first establishes a test condition (hence the RewriteCond) that, if met, will trigger the action in the second line (hence that line's RewriteRule). In the condition line, the %{HTTP_HOST} is an Apache system variable that contains the name of the "host" in the received URL call, which is the part of the URL after the protocol-identifying http:// part up till (but not including) the first directory / (if any).
In both blocks, the "condition" tested for by that line is whether the host is not equal to the wanted form: the leading exclamation point (aka "screamer"), the ! mark, acts (in accord with "regular-expression" rules) as a negator of what follows.
(The \ backslashes are "escape" characters: because a period has a special meaning in "regular expressions", the leading escape character is saying "no, treat this period as an actual, literal period"). The caret ^ character signifies "start of text", so that a host name containing the specified text is not mistaken for the actual text (that is, so that www.mywonderfulsite.com cannot be mistaken, just because it contains it, for mywonderfulsite.com).
The bit in brackets, the [NC], is a flag meaning "no case": that is, the test will ignore case, so that a call to www.MyWonderfulSite.com and a call to www.mywonderfulsite.com will be treated the same. Thus, for the block intended to redirect all calls to the URL form with a leading www., a call for www.MyWonderfulSite.com will not be seen as "failing", even though it is not precisely "equal to" a call to www.mywonderfulsite.com .
If the condition is met--that is, the host is not (disregarding case) a call to the preferred host form, then the Rule is followed. In the statement of the Rule, so-called "regular expressions" are again used--if you're not well familiar with those (they're a powerful "wildcarding" mini-language), google up some information. The form of a Rule is: input, and transformed output--this becomes that.
The parentheses in the input this part mean, to Apache, that what is inside them is a local variable available in casting the that result part of the Rule: up to nine such local variables can be set in a given input, but we only need the one, the ^(.*), which here--by the form of regular expressions--captures anything and everything that comes after the host part of the URL.
The output that part of the Rule--http://mywonderful.com/$1 [L,R=301]--has two parts: the actual that (http://mywonderful.com/$1) and a "flag" (the [L,R=301]) that adds some ancillary action to the actual redirect. Once we know that the code bit $1 signifies the variable captured in the this part of the Rule, all is clear: we prefix the wanted host form, http://mywonderful.com/, to what came after the root / slash.
The "flags" here tell the Rewrite engine to stop processing the condition after this Rule is applied (complex redirection can involve several cascaded tests and rules)--that's the L part--and, critical to us, to issue a 301 Redirect to the caller. And that's the job done.
To link to this page, please copy and paste this exact
code:
<strong><a href="http://seo-toys.com/tips-on-seo/seo-tips-3.shtml">SEO Tips #3:
That Pesky <em>www</em></a></strong>
SEO (Search Engine Optimization)
Tools, Toys, and Packages:
an introduction to SEO principles and the SEO Tools offered on
this site
The SEO Tools, Toys, and Packages:
the actual free SEO Tools offered on this site
"Freebie"--
several thousand relevant, no-maintenance, daily-changing site pages
"Validate"--
make sure all your web pages are searchbot-readable HTML
|
several sizes of page drop-ins for weather anywhere in the world
--this is the "tiny" form; there are other samples available |
|||||||||||||||
|
exchange rates for (almost) any currency
versus (almost) any others-- this is just a sample of what it can look like: |
|
||||||||||||||||||||||||||||||||||||||||||
"Know"--
very small, very simple, modest but tantalizing "freshness" dropin suitable
for any site or page whatever:
"ReDate"--
make sure the searchbots know that your pages are fresh
SEO Tips:
useful explanations of SEO Basics
SEO Tips #1:
"What Is SEO?" - an explanation of what SEO is and of some of
the more important basic concepts in doing it
SEO Tips #2:
"Don't Let the Tail Wag the Dog" - basics of good site design that
co-exist with, but transcend, sheer SEO
SEO Tips #3:
"That Pesky www" - how to keep from losing backlink value on
all your pages
SEO Tips #4:
PR versus SERPs - keeping your eye on the
right ball
Find and Buy Books:
both new and used, from our bookshop via Amazon and Abebooks
Internet-Related Books Available New Today:
Internet-Related Books By Title:
("internet" book titles beginning with the word "internet" are broken out separately in the alphabetical title lists below)
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | "Web" | X | Y | Z | non-letters
The "Internet"-Books "Master List" (a very large file!)
|
|
This site is one of The Owlcroft Company family of web sites. Please click on the link (or the owl) to see a menu of our other diverse user-friendly, helpful sites. |
|
And why not look in at Is it a blog yet?
So that you need not be a victim of the "Browser Wars," I have taken the trouble to assure that
Not every browser renders proper HTML correctly (Internet Explorer famously does not);
so, if your browser experiences any difficulties with this page (or, really, even if it doesn't),
(It's free!)
All content copyright ©2004 - 2010 by The Owlcroft Company