2008 September | Federated Search BlogFederated Search

Archive for September, 2008


Abe and I were having a conversation the other day about the relationship between the deep web and the technology of federated search. I made the point that I believed that federated search engines were typically used to search deep web sources and that, therefore, “deep web” and “federated search” were terms that would commonly be thrown around in the same conversation. In other words, I claimed the two terms would be used interchangeably even though they have different meanings. Abe made the point that I was wrong. I then pointed out that the company he founded was named Deep Web Technologies and not Federated Search Technologies even though Deep Web is very focused on federated search. Abe explained that “federated search” wasn’t a common term when Abe founded Deep Web in 2002. Fair enough.

For those of you new to the industry, federated search is the technology that allows a user to search multiple content databases (also known as content sources) at the same time. Federated search commonly, but not always, merges results from the different sources and sometimes returns the documents in a relevance-ranked order. Federated search applications also have to deal with removing duplicates from results, and individual applications have different ways of filtering, sorting, displaying, and managing of search results.

Read the rest of this entry »


The Lone Wolf Librarian posted an article earlier this month challenging recently-turned-ten Google on its claim that search was 90% solved. That article got my attention. It is an excellent read all by itself but what really caught my eye was the reference to an LA Times blog interview with Marissa Mayer, Google’s vice president of search products and user experience. When asked what the next ten years holds for Google, Ms. Mayer makes this statement:

I think there will be a continued focus on innovation, particularly in search. Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.

“There is a lot to go in the remaining 10%” is a huge understatement. While no one knows for sure, the deep web, where the non-Googleable database content lives, is commonly considered to be several hundred times larger than the surface web. So, even if Google has solved the whole surface web problem, they have only solved a fraction of the whole search problem. This is good news for those who make a living selling federated search solutions.

Read the rest of this entry »


[ Note: Two very huge prime numbers were discovered recently, one last month, the other early this month. These primes, known as Mersenne primes, were discovered via a “divide and conquer” approaches, validating the distributed search approach for fields as distant as federated search :)]

Jonathan Rochkind and Stephan Schmid left comments in response to my article on federating large numbers of sources. I’d like to respond to a part of Jonathan’s comment as well as to a piece of Stephan’s comment.

Read the rest of this entry »


Knowledgespeak, an online service dedicated to the STM (science, technology, and medical) publishing industry, recently conducted an interview with MuseGlobal President Kristina Bivins. This interview is a nice complement to the luminary series interview I conducted with MuseGlobal co-founder Kate Noerr.

The interview covers a number of subjects:

  1. A number of components of MuseGlobal’s plug and play content integration technology

  2. Two new products MuseGlobal released this year: News Hound and Blog Hound

  3. Partnership with Adhere Solutions to extend functionality of Google’s Search Appliance to include federated search and more

  4. The Muse Content Machine

  5. Biggest challenges for publishers and how MuseGlobal is positioned to meet the challenges

  6. Growth of the company

Readers of this blog may also enjoy a number of the Knowledgespeaks interviews; many are with publishers who provide content that is federated.


This week the Science.gov Alliance released Science.gov 5.0. The release got a good amount of press from a couple of press releases (from the US Department of Energy and Deep Web Technologies) and from a number of bloggers, including Valerie Allen (Product Manager for Science.gov) on OSTI’s own blog and the SLA Government Information Division blog. When I worked for Deep Web, I supported the application and have enjoyed watching it evolve and grow.

Read the rest of this entry »


Abe and I were recently discussing the federation of large numbers of sources and the question came up: “What would it take for a single application to federate hundreds or even thousands of sources?” The conversation turned to a discussion of an approach that this blog’s sponsor, Deep Web Technologies (DWT), had developed to federate a number of federated search applications. The discussion of this “divide and conquer” approach inspired this article. You can read more about the ideas discussed here in two of Abe’s presentations:

I should note that DWT’s approach is not the norm and that large source scalability is not something that many customers need to be concerned with today. But, I do believe that we’ll be seeing more federated search applications searching greater number of sources in the years to come.

Read the rest of this entry »


We’re six and a half weeks away from the October 31 deadline for the federated search writing contest. From the questions I’ve received, I know there’s interest in the contest. I’d like to see a bunch of early submissions so I’m going to try to entice those of you who are on the fence about submitting, or putting it off until the last minute, to start writing.

Blog and contest sponsor Deep Web Technologies will give a $25.00 Amazon.com gift certificate to those of you whose submissions are among the first ten. If you’re outside of the U.S. they’ll send you $25.00 via PayPal. If you’ve already sent me a submission I’ll email you to arrange sending your “early bird” gift. To be eligible for the $25.00 you must submit a serious entry and follow all the rules of the contest in the announcement post.

I know there are visionaries out there. The noted industry experts serving as contest judges look forward to your essays.


[ Editor’s note: Scott Rice, E-Learning Librarian at Appalachian State University, reviews an essay in Christopher Cox’s book about federated search. What do users expect from federated search? What do librarians expect? Read on …

Given the quality of the essays in Mr. Cox’s book plus the severe lack of any books related to federated search, I highly recommend the book. You can purchase a copy of Mr. Cox’s book of essays from the publisher, Taylor & Francis, who donated the review copies, by calling their Customer Service department, Monday-Friday 9 A.M. – 5 P.M. EDT, at (800) 634-7064.

Read the rest of this entry »


[ Editor’s note: A few weeks ago I asked Carl Grant if he’d be willing to write a regular column for this blog. He agreed in principle while expressing the concern that his new responsibilities as President of Ex Libris North America might make it difficult to commit to a schedule. So, I took the pressure off of Carl by inviting him to write when he was able to and not worry about a schedule. Not too long after that conversation I received an email from Carl with the article below.

As usual, Carl doesn’t mince words in this article. He bluntly asks librarians to assert and uphold the value they provide to their patrons by demanding functionality from their federated search vendors that “feature[s] the added value of librarianship.” ]

Read the rest of this entry »


In an effort to help customers to clarify their needs when considering federated search products and services, I’ve produced a list of over 100 questions to consider when you talk to vendors.

     100 Federated Search Requirements Questions To Ask Vendors

I’ve purposely published the document in Word format, rather than as a PDF file, so that you can edit the list, and copy and paste from the list, to meet your needs.

The checklist is categorized and includes questions pertinent to self-hosted or vendor-hosted. I will be soliciting input from a number of vendors to fill in any gaps in question or topic coverage.

Read the rest of this entry »