web scrapingMore than almost anything else, I absolutely despise scrapers. These are evil jerks who can’t be bothered to create their own content and so they instead steal content from honest bloggers who take the time to write original material. However, there are a number of ways you can combat scrapers. Here’s what you need to know.

Scrapers, Not Sourcing

I just want to be clear that there are two different concepts which some people tend to confuse. One is writing the other is research. Using someone else’s research is perfectly legal. This means that you can for example write that the debt ceiling deal was finally passed by the House of Representatives this past Monday even though you didn’t happen to be there to see it. You read about it in the New York Times or whatever and that’s sufficient.
That’s a pretty obvious example, but the bottom line is that taking ideas from others is 100% legal and is not going to be a problem unless you happen to steal a patented idea. I wanted to make this clear because when I talk about scrapers, I mean people who steal your content word for word or nearly word for word and publish it as if it were their own and not people who merely take ideas from elsewhere.

Link Liberally

The first thing you can do to combat scrapers is to link liberally throughout your document. While this won’t stop the scrapers from stealing your content, in most cases it will at least provide you with benefits to give you backlinks. That’s because most scrapers don’t strip out the links before they post. They simply do a complete copy and paste and leave your links intact.

Use Embed Anything

Another thing which you can do to beat the scrapers is to use Embed Anything. This is a program and WordPress plugin which forces people to get a link back to your website when they do a copy and paste. The advantage of Embed Anything is that it ensures that you’ll get links even if they take just one paragraph of text from you and mash it together with others’ content. Again, it only works when they don’t take the time to strip off the link, but most people who scrape are too lazy to figure out how to do this.

Truncate Your RSS Feed

There is something to be said for providing a full RSS feed since this allows you to provide your readers with a quality user experience when they visit the feed. However, if you want to stop scrapers, you may want to offer just a paragraph since many scrapers steal content by simply copying directly from your RSS feed.

Include a Copyright Notice

Finally, include a strong copyright notice on each page where you make clear that you will pursue people who steal your content. You can use Copyscape to look for theft and then go after these people by filing a complaint with Google as well as with their web host. It can be a pain to do, but the truth is that by including this copyright notice, you’ll at least prevent some scrapers from taking your content.