There are many webmasters and bloggers who are vigilant about reducing or eliminating duplicate content on their blogs and web sites. Everyone seems concerned about assigning a post to multiple categories, their non-wwww domain name being seen as a duplicate of the www version, and so on.
In fact, the impetus behind this post comes from a comment on a my previous post about fixing up somesone's "mistake" of using their TypePad subdomain instead of their own domain for their blog.
People are so vigilant that they might think I'm trying to be controversial (or that I'm just stupid) when I say to just chill out over the subject. Before my inbox gets flamed with mail, let me explain where I'm coming from.
Google is the king of search because they focus better on the authority of sites and relevancy of content with regard to specific search terms. Some might disagree with this statement, but go do the same search on multiple search engines and compare results. In my opinion, Google wins the relevancy contest consistently.
Take this point and apply it to a site that springs up overnight with hundreds of pages of content that is nearly identical to existing content on other sites. That new site generally is not going to rank well. As I've mentioned in yet another previous post about making money with blogs, I tried going down this path a long while back, and it just doesn't work. Overtly spammy sites reliant on duplicated content generally don't rank well or at all in Google.
What Google tries to do is figure out the one best source for a given piece of content and do their best to make sure the site producing the original content gets their props for doing so in the form of increased "page rank" and SERP performance. It is my belief that Google's focus on duplicate content relies on inter-domain content duplication much more than intra-domain duplication.
When you create a post in your TypePad blog, you're going to get duplicate content pretty much no matter what you do. The post will appear on the main index page as well as the permalink page and category page(s), not to mention the RSS and ATOM feeds.
If your site is on the up-and-up and you're not relying on any black hat SEO hacks, don't worry about it. Google and the other search engines will figure it all out, and the single best source for your content will find it's way into the SERPs. Now that doesn't mean you should start associating posts with 5 different categories. Pick one category if you can, or two if you feel that readers would derive benefit from the multiple category post.
It also doesn't hurt to link to your own content in your posts where appropriate, as I've done here. This will help the search engines find a path back to your original content if and when your posts are syndicated to other sites.
Now, let's address the inter-domain issues of duplicate content.
First, I believe non-www domain vs. www subdomain content falls in line with the previous points - Google and the other search engines will figure out the best source. If all your internal links point to the www subdomain, you can be pretty sure that those pages will be determined to be the original, best source of content. If you use Google Webmaster Tools, you can also explicitly tell Google which one is preferred. But by setting up your domain correctly to begin with, you can eliminate any doubt about possible duplicate content with a 301 redirect from the non-www to the www subdomain. Unfortunately, if you register your domain with with Yahoo or Network Solutions, this 301 redirect capability is non-existent. GoDaddy, on the other hand, does provide this functionality.
For those who have used their TypePad subdomain and want to switch to their own domain name, I think I outlined the best solution for making the transition as quickly and painlessly as possible without generating inter-domain duplicate content. Note: I did not mention it in that previous post, but your TypePad domain's URLs can also be removed from the Yahoo index using Yahoo!'s Site Explorer. Honestly, I haven't looked into MSN (or Live, or whatever they're calling it now). I simply focus on Google because they are by far the dominant search engine today.
The one piece of functionality that is missing from the exercise of moving from your TypePad domain to your own domain is the ability to create 301 redirects from one to the other. If TypePad could offer this one simple feature, it would make everyone's experience with their system so much better.
Recent Comments