There are two approaches to gathering content for searching on the Internet. The most well known approach is crawling and indexing. This is where the search engine starts with a list of known web pages, extracts the text from these pages, and follows the links from them to find new pages and new text to extract. All of this text is indexed for rapid search and retrieval of relevant documents.

The second approach is to perform live searches of content in web-sites that lives in databases. This content is typically accessed by filling out web forms, in much the same way that humans fill out forms when they are searching databases. Searching via forms, which is what federated search largely does, is also known as deep web searching.

This article compares the pros and cons of the two approaches in five areas and illustrates why both methods of accessing content are necessary. This assessment allows us to see that the arguments about which approach is better are fallacious. More importantly, we’ll conclude that for many users federated search, which accesses content from different sources in different ways, and merges the results together, provides the best of both worlds.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!