Canonical URLs: Telling Search Engines which to Use
For any given website page, there are more than one URL that can be used to get to it. While this is extremely useful in certain situations, like when you want to view your page before the domain propagates, it sometimes can cause undesired effects.
Luckily most search engines, like Google & Bing, understand every page has multiple working URLs, so they usually correctly determine what pages are duplicates, and what URL is most likely the canonical URL. The canonical URL is the preferred URL for a page.
Examples of URLs for a Single Page
For example, depending on your setup, it may be possible to visit the same page by using the following example URLs:
- primarydomain.com
- www.primarydomain.com
- 10.0.0.2/~username/
- primarydomain.com/index.php
- www.primarydomain.com/index.php
- 10.0.0.2/~username/index.php
If the domain is an addon domain, it may be possible to view the same page through these hypothetical URLs:
- addondomain.com
- www.addondomain.com
- primarydomain.com/addondomain.com
- addondomain.primarydomain.com
- 10.0.0.2/~username/addondomain.com
- addondomain.com/index.php
- www.addondomain.com/index.php
- primarydomain.com/addondomain.com/index.php
- addondomain.primarydomain.com/index.php
- 10.0.0.2/~username/addondomain.com/index.php
How Search Engines Guess the Canonical (Preferred) URL
First of all, it is important to note that even though all of the URLs exist for the same file, most search engines and your visitors will never encounter them and will not even know they exist. So, for example, that temporary URL your host gave you with your IP address and username in it will only be known by you, unless you tell someone else about it.
Of the ones it knows about, they only know it exists because someone told them about it, usually by using it in a link on a web page somewhere.
Search Engines usually find URLs a couple of ways:
- They found a link to your page on a web page they already knew existed.
- The found the link / URL in a site map and/or RSS feed.
- The link / URL was submitted to them directly, usually via their website.
- Somone visited your page while using their browser toolbar.
Once they find out about the page, they compare it will other pages that appear to be identical or almost exact matches to spot duplicates. If they spot a duplicate page, they then try to figure out which URL should be the canonical URL.
Although they keep their exact algorithms secret, there are some things that they are known to check:
- What URL everybody seems to be using when linking to you.
- What URL is used in your site map and RSS feeds.
- Whether you specified a canonical URL in the meta tags of your pages.
- For Google, whether you specified a canonical URL in Google Webmaster Tools.
- Whether the URL redirects to another URL.
Was this answer helpful?
Also Read
Powered by WHMCompleteSolution