In 2004 Abe and I produced a three-part article series for New Idea Engineering. New Idea Engineering is a software and service vendor specializing in Enterprise search. The articles provide a very general introduction to the deep web. While this is a blog about federated search, deep web searching is very closely related, as deep web content is often federated, or aggregated. And, the content that is federated is often deep web content.
These are the articles:
Mining the Deep Web
This is a good introduction to what the deep web is, how it’s different from the so-called surface web, how Google acquires content and how deep web search engines acquire it.
Challenges of the Deep Web Explorers
In this article we discuss the pros and cons of harvesting vs. real-time deep web searching of content.
- Beyond Information Clutter
This article introduces the issue of relevance ranking of search engine results, and one way that Deep Web Technologies deals with the problem. We invite discussion of other approaches.
These articles are intended for the person completely new to concepts such as deep web, surface web, crawler, and harvesting.