Scrapers infiltrate the deep web | Federated Search BlogFederated Search
17
Oct

This tweet pointed me to a Wall Street Journal article: ‘Scrapers’ Dig Deep for Data on Web.

At 1 a.m. on May 7, the website PatientsLikeMe.com noticed suspicious activity on its “Mood” discussion board. There, people exchange highly personal stories about their emotional disorders, ranging from bipolar disease to a desire to cut themselves.

It was a break-in. A new member of the site, using sophisticated software, was “scraping,” or copying, every single message off PatientsLikeMe’s private online forums.

There’s a huge and growing market for deep Web information for marketing, competitive intelligence, background checking and other purposes. The deep Web isn’t just about finding scholarly documents in scientific, technical, or business journals. Private information on web forums may not be as private as we would like.

The market for personal data about Internet users is booming, and in the vanguard is the practice of “scraping.” Firms offer to harvest online conversations and collect personal details from social-networking sites, résumé sites and online forums where people might discuss their lives.

If you thought that you were safe because you don’t use your real number in online forums, think again:

New York-based PeekYou LLC has applied for a patent for a method that, among other things, matches people’s real names to the pseudonyms they use on blogs, Twitter and other social networks. PeekYou’s people-search website offers records of about 250 million people, primarily in the U.S. and Canada.

Read the whole WSJ article here.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags:

This entry was posted on Sunday, October 17th, 2010 at 9:32 am and is filed under viewpoints. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

2 Responses so far to "Scrapers infiltrate the deep web"

  1. 1 Jonathan Rochkind
    October 18th, 2010 at 3:42 pm  

    What makes the ‘deep web’ ‘deep’ exactly? Especially if it’s being spidered, it doesnt’ sound so ‘deep’ anymore, the only reasonable definition I can think of for ‘deep web’, is those parts of the web that are NOT spidered.

    I don’t really think ‘deep web’ is a useful term anymore, if it ever was, which it may not ever have been. What about the term do you find useful, and what do you think it means exactly?

  2. 2 Sol
    October 22nd, 2010 at 9:52 am  

    I think of the deep web as the content that’s behind a web form. Yes, for a number of reasons, a significant amount of deep web traffic exists in the surface web. But, plenty of deep web content has not been ‘surfaced’ so there’s value in recognizing that lots of content isn’t available via the spiders.

Leave a reply

Name (*)
Mail (*)
URI
Comment