The title above is a little bit misleading. Web scraping is generally thought of as a vile process engaged in by unscrupulous people who want to profit off of someone else’s hard work. However, it can also be helpful to your website and business depending on how it’s employed and when it is employed. Here’s what you need to know.
Just What Is Web Scraping Anyway?
Put simply, web scraping involves taking content from another website and republishing it on your own website. This can be done through a variety of means, some of them more underhanded than others:
You could do a manual cut and paste, taking the content from another website and pasting it up on your own website with your own name on it, offering no credit to the website where the material originally came from.
You could also grab someone’s RSS feed and publish it on your website in its entirety, thus at least giving credit to the originating website, though not giving your readers much reason to go and visit them (after all, if I can see all the content here, why go there?)
Finally, you could simply embed material (generally videos or graphics) from another website on your own site, sometimes giving credit and other times not doing so.
Is it Legal?
Now, is any of this legal you may be asking? The answer is, it depends. For example, web scraping from YouTube is perfectly legal and even an accepted practice. You might for example embed a video from them on your own site; the video still runs from their servers and features their ads, but you also get benefit by showing a video to your visitors.
Publishing an RSS feed gets stickier since you are then taking content from someone else in its entirety. On the other hand, they made the RSS feed publicly available so maybe they want it syndicated. If you want to make sure you’re on solid legal ground, you’d need to request permission from the owner of the content to do republish an RSS feed.
Obviously, taking content from someone else and republishing it as your own with no credit to the original owner is generally going to be looked at as a very big no no.
I personally will contact web hosts for companies that do that with my content, threatening to sue and demanding the site be taken down if my content isn’t removed. The one exception of course is when you buy PLR articles, in which case you simply won’t have very high SERP rankings since 200 other sites have the same stuff up.
By the way, for the record, if you want to republish any content I write here, you need to ask the owner of QuantumSEO Labs, Yasir Khan for permission and you should contact me as well. If it’s at a site I own personally, such as my personal finance blog, you just have to ask me.
When To Engage in Web Scraping
Google tends to frown on web scrapers since they don’t want to serve up duplicate content to their readership. Therefore, basing most or all of your website on someone else’s content is likely to get you delisted, even if you have their permission (exception to the rule: aggregator services such as Yahoo!’s “My Yahoo!” service).
However, occasional web scraping, when you know you’re on solid legal ground (for example, when you post a YouTube video with a comment about it or you put up an occasional article from an RSS feed with permission of the owner) can be helpful in filling out the content of your web site so that your visitors gain a richer experience.
After all, why recreate what someone else did so well and what they are willing to syndicate? Just make sure to add your own comments as well in order to avoid Google’s duplicate content filters and make sure to keep your web scraping to a minimum.