Canonical URLs & Preventing Duplicate Content

A canonical link is an HTML element that helps webmasters prevent issues with duplicate content. The canonical URL tells search engines which version of a URL to index. One page may be associated with many different URLs. A search engine will attempt to identify the canonical, or authoritative URL for each page. Unlike duplicate content, canonical URL issues happen only within a site and not between separate sites.

In 2009, Google, Yahoo and Bing announced support for the canonical link element, which can be used to prevent a loss of search engine ranking due to duplicate site pages. Google stated that the canonical link element is not considered to be a directive, but a hint that the web crawler will “honor strongly”.

While the canonical link element has its benefits, Matt Cutts, leader of Google’s webspam team, has claimed that the search engine prefers the use of 301 redirects. Cutts stated the preference for redirects is because Google’s spiders can choose to ignore a canonical link element if they feel it is more beneficial to do so.

There are many different forms of duplicate content, but the major reason is multiple URLs that point to the same page. This happens for a lot of different reasons. An ecommerce site may allow various options for sorting a page. An example of this would be by lowest price, highest rating, etc., the marketing department might want tracking codes added to URLs for analytics. This may lead to a hundred pages with 10 URLs for each page, creating 1,000 URLs for the search engine to sort through.

This causes problems because:

Less of the site may get crawled. Search engine crawlers use a limited amount of bandwidth on each site. If the crawler is only able to crawl 100 pages of your site in a single visit, you want those pages to be unique pages, rather than 10 pages being crawled 10 times.
• Each page may not get full link credit. If a page has 10 URLs that point to it, then other sites can link to it 10 different ways. One link to each URL lowers the value that the page could have if all 10 links pointed to a single URL.

Using a new canonical tag
Specify the canonical version using a tag in the head section of the page as follows:
<link rel=”canonical” href=”

You can only use the tag on pages within a single site including subdomains and subfolders. You can also either use relative or absolute links, however, search engines recommend absolute links.

This tag will operate in a similar way to a 301 redirect for all URLs that display the page with the tag. Links to all URLs will be consolidated to the one specified as canonical. Search engines will consider this URL the one to crawl and index.

Best practices for a canonical URL
The search engines are more likely to use this process if the URLs use some best practices including:

• The content rendered for each URL is very similar or exact
• The canonical URL is the shortest version
• The URL uses easy to understand parameter patterns, as in the case of ? and %

Matt Cutts of Google claims, when asked if this process can be used by spammers, that the same safeguards that prevent abuse by other methods (such as redirects) are in place here as well, and that Google reserves the right to take action against sites that are using the tag to manipulate search engines and violate search engine guidelines.

This tag will only work with very similar or identical content, so you can’t use it to send all of the link value from the less important pages of your site to the more important ones.

If there is a conflict between tags, for example, if they point to each other as canonical, the URL specified as canonical redirects to a non-canonical version or the page specified as canonical doesn’t exist search engines will handle them as they do any other pages, and will determine which URL they think is the best canonical version.

The canonical tag won’t completely solve duplicate issues on the web, but it does help make things a lot easier especially for ecommerce sites. Site owners need all the help they can get to stay ahead of the pack in search rankings.

