Abe and I were having a conversation the other day about the relationship between the deep web and the technology of federated search. I made the point that I believed that federated search engines were typically used to search deep web sources and that, therefore, “deep web” and “federated search” were terms that would commonly be thrown around in the same conversation. In other words, I claimed the two terms would be used interchangeably even though they have different meanings. Abe made the point that I was wrong. I then pointed out that the company he founded was named Deep Web Technologies and not Federated Search Technologies even though Deep Web is very focused on federated search. Abe explained that “federated search” wasn’t a common term when Abe founded Deep Web in 2002. Fair enough.

For those of you new to the industry, federated search is the technology that allows a user to search multiple content databases (also known as content sources) at the same time. Federated search commonly, but not always, merges results from the different sources and sometimes returns the documents in a relevance-ranked order. Federated search applications also have to deal with removing duplicates from results, and individual applications have different ways of filtering, sorting, displaying, and managing of search results.

The deep web refers to all of the web-sites whose content can’t be accessed by web crawlers like Google who follow links to discover content in what’s known as the surface web. Deep web content lives in databases behind search forms. It takes specialized applications, i.e. federated search applications, to fill out those forms, or to access content via special interfaces provided by the content owners.

So, what does deep web have to do with federated search? My claim is that what’s interesting about federated search is that it allows access to deep web content. Sure, you can build federated search applications that search web sites that have been crawled and indexed if the indexes are made available for searching. But, to me, federated search is about finding scientific, technical, business, and other kinds of documents that have been assessed by some organization for quality. I believe that federated search has its greatest value when it searches deep web content sources. So, I don’t see a great value in discussing federated search outside of the context of the deep web.

My belief and opinion aside, I wondered how frequently federated search vendors used the two terms. So, I conducted a little experiment with Google. I picked a number of vendors that I consider to be federated search vendors and I searched Google twice for each vendor, first with the quoted phrase “federated search” and, a second time, with the phrase “deep web.”

Here are the results. The number at the end of each line is the number of results Google reported:

“ex libris” “federated search” 3640
“ex libris” “deep web” 683

groxis “federated search” 924
groxis “deep web” 743

museglobal “federated search” 3350
museglobal “deep web” 348

proquest “federated search” 8240
proquest “deep web” 2440

“serials solutions” “federated search” 3810
“serials solutions” “deep web” 295

webfeat “federated search” 8230
webfeat “deep web” 282

What did I learn? Well, in almost every case, references to vendors are much more likely to include the phrases “federated search” than the phrase “deep web.” Groxis was the one exception; both phrases were heavily associated with Groxis. Note that WebFeat had the lowest ratio of references to “deep web” vs. “federated search.”

One other search I performed was to see how often vendor web pages referenced both “federated search” and “deep web” in the same page. Here’s that data:

“ex libris” “federated search” “deep web” 190
groxis “federated search” “deep web” 92
museglobal “federated search” “deep web” 310
proquest “federated search” “deep web” 260
“serials solutions” “federated search” “deep web” 162
webfeat “federated search” “deep web” 232

What about Deep Web Technologies? I don’t think that googling for “deep web technologies” AND “deep web” would have much meaning. So, I performed these two searches:

“deep web technologies” “federated search” 2600
“deep web technologies” 8570

Nearly a third of the references to “deep web technologies” include references to “federated search.” And, no, the references are not all in this blog!

My conclusion? Don’t bet with Abe on this kind of question. He knows the language of the industry much better than I do.

If you enjoyed this post, make sure you subscribe to the RSS feed!


This entry was posted on Tuesday, September 30th, 2008 at 8:48 am and is filed under viewpoints. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

4 Responses so far to "Federated search and the deep web"

  1. 1 Peter Noerr
    September 30th, 2008 at 3:12 pm  


    Since you will undoubtedly be talking to Abe in the near future, perhaps you could ask him for his, and your, opinion on what federated search is useful for other than for (or instead of?) searching the deep web.

  2. 2 Sol
    October 5th, 2008 at 9:57 pm  

    Peter - Neither Abe nor I know what federated search is useful for other than searching the deep web.

  3. 3 Matthew Theobald
    October 6th, 2008 at 6:16 am  

    Through ISEN, also founded in 2002, the deep or a large part of it would be brought to the surface.

  4. 4 Sol
    October 6th, 2008 at 9:52 pm  

    Matthew - I’d be very interested in learning more about ISEN. Perhaps you could share more information than what is at isen.org? If you are interested and able to share more then let’s have an offline discussion about how to make that happen.


Leave a reply

Name (*)
Mail (*)