Beyond federated search? | Federated Search BlogFederated Search


19
Mar

I think it’s safe to say that, given the choice between searching a content source in real time vs. searching it from an index, we’d all opt for searching the index. This assumes, of course, the index is as current as the content that might be federated. I’ll be the first to admit that federated search is a necessary evil. But, necessary it is. I’ve been hearing people talk about life beyond federated search and I just don’t get it. Until every single content provider makes the full-text of all of their documents that can be federated available for harvesting and indexing, federated search isn’t going away.

Serials Solutions’ new Summon Unified Discovery Service is touted as going beyond federated search. The promotional video boasts how there are no connectors, no inconsistent metadata, and no waiting for results to come back. This is all well and good but how do you deal with quality content sources that are not available through the service?

I need to say that I don’t have an objection to Summon. Serials Solutions has done a very impressive job of lining up a number of major publishers to make tons of content available to subscribers. And, just because Serials Solutions is a competitor to blog sponsor Deep Web Technologies, I’m not dissing their service. My one and only complaint is with the message that the service somehow eliminates the need for federated search.

I do think that harvesting and indexing technologies have a very important role in search solutions. In particular, when you have full text of articles you can perform much better relevance ranking than when you’ve got only title, author, and abstract or snippet. But, you can’t (or shouldn’t) ignore content that can only be federated. Hybrid systems make sense to me. You index as much as you possibly can and federate what you can’t.

One of the critical roles that federated search plays is to provide access to the sources of a client’s choosing. I’ve written about the importance of federated search engines being comprised of diverse content sources. WorldWideScience.org is an excellent example of a federated search engine that searches diverse sources — specifically global sources from national governments and from organizations that are blessed by their governments. These are quality sources that are providing their research results and other scientific documents to the public for free. How does access to free scholarly content fit into Summon’s business model?

The danger with relying on any one service to provide you with access to its indexed content is that the service’s criteria for source selection may not be yours. That’s why I recommend hybrid solutions to get the most out of indexed content and the freedom of including federated sources of your choosing as well.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags:

This entry was posted on Thursday, March 19th, 2009 at 1:37 pm and is filed under discovery service, viewpoints. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

5 Responses so far to "Beyond federated search?"

  1. 1 Jonathan Rochkind
    March 19th, 2009 at 3:03 pm  

    “Until every single content provider makes the full-text of all of their documents that can be federated available for harvesting and indexing”

    EVERY SINGLE content provider does NOT make their content available for federated search in the first place. Of the approximately 800 licensed databases we have listed in our collection, only about 300 are federated search-able. The remainder are largely not there because of lack of functionality on the content provider’s end, not on our fed search vendor’s end.

    So that’s a false comparison.

    If Summon can provide access to about the same amount of content as federated search, including our most important/most used content, it’ll be a contender.

    A hybrid local index/broadcast search system is an obvious idea. But it’s tricky to figure out how to search both classes of content in one search without bringing things down the lowest common denominator of fed search.

  2. 2 Jonathan Rochkind
    March 19th, 2009 at 3:07 pm  

    PS: And certainly there might be SOME content that is available via broadcast search but not summon. And vice versa. Sure. For the academic research market, that’s not important: What’s important is we’re already not being able to offer unified search of ALL content, so switching to a different set of “not all” with a much better user experience will be a win, if it’s the right different set comparable in scope.

    The trick is indeed herding all that indexed metadata from many various sources. Summon’s promise is that the vendor will do that for you. If it can be done reliably at an affordable price, and can encompass a range of content _comparable for our needs_ (not identical) to existing broadcast search solutions… it’ll be a serious contender.

    The proof will be in the pudding of course. From talking to SerSol folks, they know these are the hurdles they need to clear to make it a realistic product for the academic research market. If they didn’t think they had a chance of clearing them, they probably wouldn’t be sending their R&D money down a black hole.

  3. 3 Terry Bucknell
    March 25th, 2009 at 2:45 am  

    It doesn’t matter too much if a particular publisher doesn’t allow Summon to harvest its content because Summon likely includes content from plenty of indexes that cover that content anyway.

    And whay wouldn’t publishers want to have their content included in Summon? Publishers quickly realised that by exposing their metadata to Google, usage of their articles shot up overnight. More use means more article purchases, more citations, higher impact. All the things that publishers and their authors want.

    It is in publishers’ best interests to make their content as discoverable as possible, whilst still maintaining access controls so that the actual content can only be viewed by subscribers/purchasers (unless it is OA).

    I’d guess that technically it is much easier for publishers to expose their content to services like Summon than to implement SRU/SRW, z39.50 etc required for good federated searching.

    Our users do not understand why Google can search ‘everything’ from a simple search box whereas our current federated search product only allows them to search up to 12 sites; has a large proportion of sites that aren’t searchable at all; has some sites where the records are returned to the interface and others where you have to link out to see the results; only retrieves a small number of records from each site initially; why some resources take so long that they time out.

    Federated search sends out the signal that the library is still old-fashioned and hard to use.

  4. 4 Mike
    April 9th, 2009 at 10:25 pm  

    If I or other Gen Yer’s can’t find what they are looking for in a matter of 1 minute, they will turn to Google and bypass the valued IP the library contains. Plain and simple. Simple discovery is a must for orgs to survive or go the way of Newspapers. Federated Search may have its place for the time being but is not the future, unified search across heterogeneous data sources is. Plain and simple.

  5. 5 Sol
    April 10th, 2009 at 4:48 am  

    Mike - I’m not disagreeing with you about what students will do when they don’t find quick results. And, I don’t disagree that harvest/index is great when you can get it. I’m just disturbed by the idea that students and libraries would give up valuable search results from good sources just because whoever they’re getting results from doesn’t include those good sources.

Leave a reply

Name (*)
Mail (*)
URI
Comment