In 2004 Abe and I produced a three-part article series for New Idea Engineering. New Idea Engineering is a software and service vendor specializing in Enterprise search. The articles provide a very general introduction to the deep web. While this is a blog about federated search, deep web searching is very closely related, as deep web content is often federated, or aggregated. And, the content that is federated is often deep web content.

These are the articles:

  1. Mining the Deep Web
    This is a good introduction to what the deep web is, how it’s different from the so-called surface web, how Google acquires content and how deep web search engines acquire it.

  2. Challenges of the Deep Web Explorers
    In this article we discuss the pros and cons of harvesting vs. real-time deep web searching of content.

  3. Beyond Information Clutter
    This article introduces the issue of relevance ranking of search engine results, and one way that Deep Web Technologies deals with the problem. We invite discussion of other approaches.

These articles are intended for the person completely new to concepts such as deep web, surface web, crawler, and harvesting.

If you enjoyed this post, make sure you subscribe to the RSS feed!

This entry was posted on Wednesday, December 5th, 2007 at 4:56 am and is filed under basics. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

One Response to "Introduction to the deep web"

  1. 1 Content access basics - Part I - screen scraping » Federated Search Blog
    December 27th, 2007 at 9:27 pm  

    [...] that lives inside of databases. Read the earlier articles on crawling vs. deep web searching and introduction to the deep web for background information on deep web searching. Also, read the article about connectors to [...]

Leave a reply

Name (*)
Mail (*)