Abe and I were having a conversation the other day about the relationship between the deep web and the technology of federated search. I made the point that I believed that federated search engines were typically used to search deep web sources and that, therefore, “deep web” and “federated search” were terms that would commonly be thrown around in the same conversation. In other words, I claimed the two terms would be used interchangeably even though they have different meanings. Abe made the point that I was wrong. I then pointed out that the company he founded was named Deep Web Technologies and not Federated Search Technologies even though Deep Web is very focused on federated search. Abe explained that “federated search” wasn’t a common term when Abe founded Deep Web in 2002. Fair enough.
For those of you new to the industry, federated search is the technology that allows a user to search multiple content databases (also known as content sources) at the same time. Federated search commonly, but not always, merges results from the different sources and sometimes returns the documents in a relevance-ranked order. Federated search applications also have to deal with removing duplicates from results, and individual applications have different ways of filtering, sorting, displaying, and managing of search results.
The deep web refers to all of the web-sites whose content can’t be accessed by web crawlers like Google who follow links to discover content in what’s known as the surface web. Deep web content lives in databases behind search forms. It takes specialized applications, i.e. federated search applications, to fill out those forms, or to access content via special interfaces provided by the content owners.
So, what does deep web have to do with federated search? My claim is that what’s interesting about federated search is that it allows access to deep web content. Sure, you can build federated search applications that search web sites that have been crawled and indexed if the indexes are made available for searching. But, to me, federated search is about finding scientific, technical, business, and other kinds of documents that have been assessed by some organization for quality. I believe that federated search has its greatest value when it searches deep web content sources. So, I don’t see a great value in discussing federated search outside of the context of the deep web.
My belief and opinion aside, I wondered how frequently federated search vendors used the two terms. So, I conducted a little experiment with Google. I picked a number of vendors that I consider to be federated search vendors and I searched Google twice for each vendor, first with the quoted phrase “federated search” and, a second time, with the phrase “deep web.”
Here are the results. The number at the end of each line is the number of results Google reported:
“ex libris” “federated search” 3640
“ex libris” “deep web” 683
groxis “federated search” 924
groxis “deep web” 743
museglobal “federated search” 3350
museglobal “deep web” 348
proquest “federated search” 8240
proquest “deep web” 2440
“serials solutions” “federated search” 3810
“serials solutions” “deep web” 295
webfeat “federated search” 8230
webfeat “deep web” 282
What did I learn? Well, in almost every case, references to vendors are much more likely to include the phrases “federated search” than the phrase “deep web.” Groxis was the one exception; both phrases were heavily associated with Groxis. Note that WebFeat had the lowest ratio of references to “deep web” vs. “federated search.”
One other search I performed was to see how often vendor web pages referenced both “federated search” and “deep web” in the same page. Here’s that data:
“ex libris” “federated search” “deep web” 190
groxis “federated search” “deep web” 92
museglobal “federated search” “deep web” 310
proquest “federated search” “deep web” 260
“serials solutions” “federated search” “deep web” 162
webfeat “federated search” “deep web” 232
What about Deep Web Technologies? I don’t think that googling for “deep web technologies” AND “deep web” would have much meaning. So, I performed these two searches:
“deep web technologies” “federated search” 2600
“deep web technologies” 8570
Nearly a third of the references to “deep web technologies” include references to “federated search.” And, no, the references are not all in this blog!
My conclusion? Don’t bet with Abe on this kind of question. He knows the language of the industry much better than I do.
Tags: federated search