Duplicate Content and Your TypePad Domain
April 12, 2008
This article reprinted from the the TypePad Hacks Weblog. The original article can be found online:
http://www.typepadhacks.org/2008/04/duplicate-conte.html
© 2008, John T Unger
There are many webmasters and bloggers who are vigilant about reducing or eliminating duplicate content on their blogs and web sites. Everyone seems concerned about assigning a post to multiple categories, their non-wwww domain name being seen as a duplicate of the www version, and so on.
In fact, the impetus behind this post comes from a comment on a my previous post about fixing up somesone's "mistake" of using their TypePad subdomain instead of their own domain for their blog.
People are so vigilant that they might think I'm trying to be controversial (or that I'm just stupid) when I say to just chill out over the subject. Before my inbox gets flamed with mail, let me explain where I'm coming from.
Google is the king of search because they focus better on the authority of sites and relevancy of content with regard to specific search terms. Some might disagree with this statement, but go do the same search on multiple search engines and compare results. In my opinion, Google wins the relevancy contest consistently.
Take this point and apply it to a site that springs up overnight with hundreds of pages of content that is nearly identical to existing content on other sites. That new site generally is not going to rank well. As I've mentioned in yet another previous post about making money with blogs, I tried going down this path a long while back, and it just doesn't work. Overtly spammy sites reliant on duplicated content generally don't rank well or at all in Google.
What Google tries to do is figure out the one best source for a given piece of content and do their best to make sure the site producing the original content gets their props for doing so in the form of increased "page rank" and SERP performance. It is my belief that Google's focus on duplicate content relies on inter-domain content duplication much more than intra-domain duplication.
When you create a post in your TypePad blog, you're going to get duplicate content pretty much no matter what you do. The post will appear on the main index page as well as the permalink page and category page(s), not to mention the RSS and ATOM feeds.
If your site is on the up-and-up and you're not relying on any black hat SEO hacks, don't worry about it. Google and the other search engines will figure it all out, and the single best source for your content will find it's way into the SERPs. Now that doesn't mean you should start associating posts with 5 different categories. Pick one category if you can, or two if you feel that readers would derive benefit from the multiple category post.
It also doesn't hurt to link to your own content in your posts where appropriate, as I've done here. This will help the search engines find a path back to your original content if and when your posts are syndicated to other sites.
Now, let's address the inter-domain issues of duplicate content.
First, I believe non-www domain vs. www subdomain content falls in line with the previous points - Google and the other search engines will figure out the best source. If all your internal links point to the www subdomain, you can be pretty sure that those pages will be determined to be the original, best source of content. If you use Google Webmaster Tools, you can also explicitly tell Google which one is preferred. But by setting up your domain correctly to begin with, you can eliminate any doubt about possible duplicate content with a 301 redirect from the non-www to the www subdomain. Unfortunately, if you register your domain with with Yahoo or Network Solutions, this 301 redirect capability is non-existent. GoDaddy, on the other hand, does provide this functionality.
For those who have used their TypePad subdomain and want to switch to their own domain name, I think I outlined the best solution for making the transition as quickly and painlessly as possible without generating inter-domain duplicate content. Note: I did not mention it in that previous post, but your TypePad domain's URLs can also be removed from the Yahoo index using Yahoo!'s Site Explorer. Honestly, I haven't looked into MSN (or Live, or whatever they're calling it now). I simply focus on Google because they are by far the dominant search engine today.
The one piece of functionality that is missing from the exercise of moving from your TypePad domain to your own domain is the ability to create 301 redirects from one to the other. If TypePad could offer this one simple feature, it would make everyone's experience with their system so much better.
More Like This: Google





Pearl says:
Hi John. I want to thank you for your valuable contribution to the net community here for people like me who had just started "serious" blogging. In fact, I wrote to you before asking about drop down horizontal menu and your tips saved my day. Other than the technical info here which saves me alot of time of research, I learned from you that its important to give it back to the community. I mean, you set up typepadhacks and share so generously about your experience of blogging and the truth about internet marketing. I myself had spent a good fortune to learn from the so called guru here in my home country. However, I didn't gain much and the knowledge I have now is mostly based on my read ups. Your site is a sincere one which is not flooded with ads and banners.
Thank you so much! May God bless you!
Pearl
Singapore
Posted: Apr 12, 2008 7:53:46 PM
Dave Weiss says:
I have to correct myself in this post. You CAN redirect your non-www domain to the www via Yahoo!
They just don't make it obvious for you, nor do they point out whether the redirect is a 301 redirect. I'll do some screen shots later this week and post them. I still prefer GoDaddy.com, as they present your options for this in a way that makes more sense to me.
Posted: Apr 15, 2008 9:10:39 AM
best seo guide says:
Hi Dave, I just came across your Blog and it's just perfect what I have been looking for. I'm working on a Site's Blog that has a typad CMS. The format of the url is: http://blog.site.com. This blog is presenting several issues related to duplicate content:
-Multiple post tagging (several categories)
-For some reason when doing the site:blog.site.com command in Google I get results with http and https both, so every URL on the Site can be displayed the http or https way, don't know where this can be solved.
I have not worked much with typad but with wordpress cms. If you can give me some lights on the subject I would really appreciate it, thanks in advance.
Gus.
Posted: Apr 18, 2008 8:31:53 PM
Dave Weiss says:
Huh.
Both http and https? That seems strange, indeed.
I'll have to think about this a bit and come up with potential reasons why this might be happening. An advanced template could be altered to produce https URLs, but I don't know why one would do that, and that seems like an unlikely basis for what you're seeing.
I don't see how any of the search engines would index https URLs if they did not, at one time or another, appear on the site. Even if there were inbound https links, I don't see how that would happen if all the internal links on the site were http.
If anyone else has any ideas on this, I'd like to see them.
I wouldn't be overly concerned about multiple categories, unless they went wild with it. One, sometimes two categories for a post shouldn't be a big deal. Consistently three, four, five categories, well..
I've mentioned before that the greatest piece of missing functionality I'd like to see in TypePad (from an SEO perspective) is he ability to 301 redirect. Certainly, there are ways this can be implemented without opening up server side vulnerabilities and malformed .htaccess entries.
For instance, a form could be created that only allows redirection from permalink to permalink and from category folder to category folder, or from permalink to the root or category folder to the root.
All the redirect information could be stored in the database (and validated to enforce these rules). Then, the .htaccess file could be published just like any template, style sheet, etc. Seems to me that would be pretty much bullet proof.
Posted: Apr 18, 2008 9:31:12 PM
Terry says:
Hi David, thanks for all your helpful advice on this subject, it really has been the bane of my life for sometime now. We are having this duplicate content issue and have been for over a year now. Even though out site has thousands of inbound links (to the mapped domain) the typepad URL still stubbornly appears in Google very often and sometimes for weeks at a time. For instance if you were to search for "the fashion police" you would see the typepad domain appearing at the moment. Although by the time you read this and search for it that may not be the case! It really has been swapping back and forth for the last year and Google for some unknown reason just hasn't been able to set up a seniority for the content correctly. The main url gets hundreds of thousands of page views per month, it's in the webmaster tools it has a sitemap, in fact I think we have done just about everything possible to tell google that IT IS the url we want it to show seniority to but unfortunately, no candy. :( Do you have any suggestions on the best way for us to proceed?
Posted: Aug 3, 2008 8:22:32 AM
Dave Weiss says:
Did you register your account's TypePad subdomain in Webmaster Tools, too?
Try that, then tell put a robots.txt file in the root of your account that keeps Google from indexing the files and folders that hang off the root. Register it in Webmaster tools. Webmaster tools has a robots.txt testing feature that will verify what content makes it through your robots.txt filter and what gets caught. Verify that content which includes the TypePad domain gets blocked by the robots.txt file.
Then, create another robots.txt file that is in the directory your blog is in. Register this one in Webmaster Tools, too. Except do the opposite for this one - allow the search engines to access any files and sub folders off the blog's root folder.
I think my logic here is correct and will work - I have done this with a few TypePad hacks clients, and we seem to be having good luck. It makes sense - block all search engine access to anything using your TypePad domain, but allow access to anything that uses your own domain name.
Of course, monitor your work and double check your results with the site: command in Google.
Good luck!
Posted: Aug 4, 2008 2:39:32 PM
Terry says:
Thanks for the detailed explanation David, I'll give that a go as I haven't tried that yet :)
Posted: Aug 5, 2008 7:18:07 PM
Dave Weiss says:
@Terry:
One other point I forgot to mention - once you get a robots.txt file registered in Google Webmaster Tools, you can then do a Google Index Removal request to have the TypePad subdomain pages removed from Google altogether. I think there is similar functionality in Yahoo Site Explorer, too, but I have not personally tried it yet.
This should leave only your own domain's pages in Google's index, and should keep the TypePad subdomain pages out.
Posted: Aug 6, 2008 1:40:49 PM
sruthin says:
thank you for the information
see these famous ebooks sites
www.way2books.com
www.ebooksrar.com
for your all ebook needs
thank you
Posted: Nov 15, 2008 3:01:14 AM