The Harvard Library Innovation Laboratory at the Harvard Law School posted a link to a 23-minute podcast interview with Sebastian Hammer. Hammer is the president of Index Data, a company in the information retrieval space, including federated search.
Update 4/3/12: A transcript of the interview is here.
Hammer was interviewed about the challenges of federated search, which he addressed in a very balanced way. The gist of Hammer’s message is that, yes, there are challenges to the technology but they’re not insurmountable. And, without using the word “discovery service,” Hammer did a fine job of explaining that large indexes are an important component of a search solution but they’re not the entire solution, especially in organizations that have highly specialized sources they need access to.
I was delighted to hear Hammer mention the idea of “super nodes” to allow federated search to scale to thousands of sources. Blog sponsor Deep Web Technologies has used this idea, which they call hierarchical federated search for several years. Several of their applications search other applications which can, in turn, search other applications. In 2009, Deep Web Technologies founder and president Abe Lederman delivered a talk and presented a paper at SLA,
Science Research: Journey to Ten Thousand Source, detailing his company’s proven “divide-and-conquer” approach to federating federations of sources.
I was also happy to hear Hammer speak to the importance of hybrid solutions. Federation is appropriate for gaining access to some content and maintaining a local index works for other content. Neither alone is a complete solution. Deep Web Technologies figured this out some years ago. A good example of hybrid search technology is the E-print Network, a product of the U.S. Department of Energy’s Office of Scientific and Technical Information, (OSTI). Deep Web Technologies built the search technology, which combines information about millions of documents crawled from over 30,000 sites, with federated content. I have been involved with the crawl piece of the E-print Network for a number of years and can testify to the power of the right hybrid solution. In 2008 I wrote a three-part series of articles at OSTI’s blog explaining the technology behind the E-print Network. Part One is here.
In conclusion, I highly recommend the podcast for a good reminder that federated search isn’t dead and that it’s an important part of search.