Yesterday I wrote “Beyond federated search?” where I raised the concern about using services that provide indexed content as a way to bypass federated search and its associated challenges.
Jonathan Rochkind left two thoughtful comments which I’d like to respond to.
“Until every single content provider makes the full-text of all of their documents that can be federated available for harvesting and indexing”
EVERY SINGLE content provider does NOT make their content available for federated search in the first place. Of the approximately 800 licensed databases we have listed in our collection, only about 300 are federated search-able. The remainder are largely not there because of lack of functionality on the content provider’s end, not on our fed search vendor’s end.
So that’s a false comparison.
Jonathan, of course you’re right. Not all content can be federated. And, at the same time, not all content is available for harvesting and indexing. In both cases, access to content is controlled by the content provider. My point is that, given that plenty of excellent content isn’t available for harvesting, I don’t see the solution as being to ignore such content. Also, I’m curious to know why 500 of the 800 sources your library uses can’t be federated. While I understand that there are some sources that are very difficult or impossible to build connectors for, I’d be concerned about any federated search vendor that could only build connectors for 38% of sources I put on my list. Can you explain further your statement that “The remainder are largely not there because of lack of functionality on the content provider’s end?”
If Summon can provide access to about the same amount of content as federated search, including our most important/most used content, it’ll be a contender.
This deeply concerns me. Some people go to CNN for their news. Others go to the BBC. Who should decide which news sources are more valuable? I strongly believe it needs to be the library or research organization that is serving its patrons, not the subscription service provider. I argue that, for all the tremendous benefits of harvesting and indexing, it’s not a complete solution. So, why isn’t your library picking its sources?
A hybrid local index/broadcast search system is an obvious idea. But it’s tricky to figure out how to search both classes of content in one search without bringing things down to the lowest common denominator of fed search.
Jonathan, yes, this is a major issue. How do you merge results from sources where one set comes from searching the unified index and another set comes from federated search? It’s not an easy problem but I think it’s a critical one for the federated search industry to solve. One possibility is to not merge the two sets of results but to put them into separate tabs, as ugly as it might seem. As an aside, I’m interested to know if Summon indexes full-text or, more likely, metadata. The quality of the metadata index is only as good as the metadata itself. If it is indeed the case that Summon is only searching metadata then its relevance ranking will be poorer than that of federated search against sources where the underlying search engine is performing a full-text search.
PS: And certainly there might be SOME content that is available via broadcast search but not summon. And vice versa. Sure. For the academic research market, that’s not important: What’s important is we’re already not being able to offer unified search of ALL content, so switching to a different set of “not all” with a much better user experience will be a win, if it’s the right different set comparable in scope.
I don’t agree that “switching to a different set of ‘not all’ with a much better user experience will be a win.” A major value of federated search is that the client gets to select the sources and that, in most cases, if the source has a search page then a connector can be built for it. I don’t think it’s desirable to make an either/or decision. Take the best of both solutions and merge the two together. Not easy but critical, in my opinion.
The trick is indeed herding all that indexed metadata from many various sources. Summon’s promise is that the vendor will do that for you. If it can be done reliably at an affordable price, and can encompass a range of content _comparable for our needs_ (not identical) to existing broadcast search solutions… it’ll be a serious contender.
I do agree with you that services like Summon will become major players in unified search. I do have the concern, though, about competition among content providers. Serials Solutions is a business unit of ProQuest, a content publisher. Summon provides access to content from ProQuest and other publishers. Publishers don’t always play nicely together so I’d be nervous about being locked into offerings from any given set of publishers, some of whom might go away in the future.
The proof will be in the pudding of course. From talking to SerSol folks, they know these are the hurdles they need to clear to make it a realistic product for the academic research market. If they didn’t think they had a chance of clearing them, they probably wouldn’t be sending their R&D money down a black hole.
I’m looking forward to seeing how the service is received and how the federated search industry engages with and responds to this new offering. It’s worth noting that Summon has an API so an organization (or federated search vendor) can build a hybrid solution with Summon as one component. In fact, I wouldn’t be the least bit surprised if Serials Solutions used their federated search expertise to build their own hybrid product.
Tags: federated search