Twins-- duplicate contentThis question comes up constantly and it seems that no matter how many times the record is set straight on this subject, people still think that there is in fact a duplicate content penalty. For those with short attention spans, here’s the fact: there is no such thing as a duplicate content penalty. Period. Now, that is not to say that duplicate content can’t hurt you. It’s just that it’s not a penalty.

Two Kinds of Duplicate Content

Actually, there are more than two kinds, but I’ll break those out in a moment. In the short term, there is duplicate content within a website and there is duplicate content which is located on another website somewhere else. The first kind is generally not a problem at all. The second kind however could potentially be problematic for two different reasons. Here’s how it all breaks down:

Duplicate Content on a Website

Duplicate content on a particular website is not really a problem. Google’s engineers fully understand that it’s possible that a website could duplicate content in the natural course of creating their site. This is especially true for shopping related websites.
For example, if you were to go to the camera department at Amazon.com, you might see a large number of cameras. However, if you were to visit the electronics department, those same cameras may very well appear for the simple reason that they fit into both categories.
In this case, Google’s algorithm just ignores one version of the page and indexes the other one. The way that you specify to Google (and other search engines) which page you want to have indexed is by including a canonical meta tag on the page that you want to have indexed. This tells Google’s web crawler to ignore all other content which is similar to that page.

WWW or URL.COM

Another way that some people end up with duplicate content is that both the URL.com (i.e. example.com) and www.url.com (i.e. www.example.com) will work equally well and display the same page. This causes Google’s web crawler to index two pages, even though only one exists. The solution is really simple. In your CPanel (where you set up the website), you simply specify that url.com redirects to www.url.com or vice versa.

Duplicate Content on Other Websites

Now we get into an interesting question, which is potentially problematic. Google’s engineers don’t want to index spam. This means that when they see a website which includes duplicated content from another site, they will not index said site. They will just ignore it. However, if there appears to be a malicious pattern of scraping content from another site, they may remove the site completely.

Duplicate Sites

Another thing that Google’s engineers work against is duplicate websites, which are basically the same exact website under a different URL. So for example, if you were to take mymoneywebsite.com and make a copy of it as greatmoneywebsite.com with the intention of trying to game the rankings, Google’s engineers could remove both sites from their index in an effort to prevent you from doing so.

What You Can Do if Your Site is Copied

I just did some research on this very subject when I stumbled accidentally onto a site which had copied a number of my blog posts from my personal finance blog. The first thing to know is that it is a royal pain in the neck to deal with this and it’s a bit like playing whack a mole. You may be able to get a site shut down, but they’ll just reopen under a new name.
That said, I believe that it’s worth doing what you can go after these scum, who steal content from someone else rather than doing the hard work of writing their own material. There are a number of options that you can try in order to deal with this:

Contact an Attorney

If you have an attorney on retainer, contact him or her and ask them to write a formal letter demanding take down of your content under the DMCA. Generally, this will do the job quite effectively and you won’t have to do anything other than pay the bill. However, if you don’t have an attorney on staff, there are things you can do yourself to ensure that Google doesn’t penalize you for duplicate content.

Contact the Web Host

Use Whois to look up which site hosts the domain name. Then, do a Google search for that company’s name together with the words “DMCA violation.” Most reputable webhosts and domain hosts have a procedure that they require you follow in order to file a copyright violation complaint.
There is typically a specific e-mail address to send complaints to along with a request for the URLs that are violating your copyright and the URLs where your original content can be seen. You’ll be required to sign off officially that you are the copyright holder and that you are demanding take down of your content.

Contact Facebook

With so many sites on Facebook today, many of them will take your stolen content and publish it to Facebook (the place I mentioned earlier did this). Facebook also requires that you mention the specific pages that violate your copyright along with information on where they can see the original material. You can use this form to contact Facebook about copyright violations.

Contact Google

Finally, Google, on their webmaster page has a section where you can report sites that violate copyright. You can use this form to report a website to Google for removal for duplicate content. You’ll need to provide the original URL as well as the URLs which infringe on your copyrights.

Bottom Line

There’s no easy way to deal with the problem of scrapers, but if you are determined, you can at least get some of them removed from the web. For all other issues with duplicate content, there is no real problem with Google – it’s just that they’ll only show one page on your site for the search results.