Mar
Carl Grant just published a thought provoking piece at his ExLibris blog: Discovering the need for discovery solutions that also support meta/federated searching. Like all of Carl’s articles, this one is worth a careful read.
Carl argues that the limitations of federated search don’t make the technology useless in the face of discovery services. He sees both technologies as having an important role in the library and, in fact, his company, ExLibris, sells both types of technology.
Here’s the gist of Carl’s argument that libraries need both solutions:
My answer would be that most libraries likely need both of these solutions because they ultimately meet different end-user needs. Both are discovery tools, but they meet the needs of end-users in different ways and deliver different capabilities.
For example, an undergraduate needing to assemble a paper quickly might well benefit from a search of a mega-aggregate index that quickly produces several results that can be used interchangeably. However, the student or researcher conducting deep research into a subject will likely want to know not only everything available from known resources, but also from unknown resources. Then the need for meta/federated searching becomes more important because it will very likely broaden the content they can find. Understanding these two divergent set of needs require the library to offer different tools within the common discovery interface.
I’ve long argued that a major problem with discovery services is that they likely won’t include all of the sources a library deems worthy of providing access to. Carl articulated this point better than I could have in terms of the “long tail” of resources, i.e. those resources that are important to people doing very specialized research but not of interest to those conducting more broad research. Here’s his quote:
Meta/federated search tools enable libraries to expand access to include more of the library resources as well as other types of resources. For instance, they’ll help address the “long tail” of resources and making them available to end-users. As described in the concept of the “long tail” not all resources are in high enough demand to justify their inclusion in a resource designed to address the masses (the mega-aggregate index in this case), but that doesn’t make them any less important to end-users who would value their content. Finally, we must remember that we’re in a time of rapidly growing number of resources composed of radically different data types. Meta/federated search are likely to greatly increase the probability that libraries will be able to search these resources as well. All of this taken together will help researchers discover for themselves that “serendipity” experience of finding results where they did not expect and providing them with greater value as a result of using the library discovery tools.
As an aside, the article Carl cites about the “long tail” is the best I’ve ever read.
I won’t quote Carl’s entire article. Read it for yourself and see if you still (if you ever did) really believe that discovery services are the Holy Grail of user search.
If you enjoyed this post, make sure you subscribe to the RSS feed!
Tags: federated search
7 Responses so far to "Carl Grant: discovery services don’t make federated search useless"
March 29th, 2010 at 6:28 pm
I have to admit, I still don’t understand why an aggregated index discovery service is less likely to include unknown (by the searcher) resources than a broadcast federated search. Not everything is covered by federated search either!
March 30th, 2010 at 12:12 pm
Rather than choose indexing or federating, or both separately, why not integrate the two technologies? At OSTI we integrate multiple approaches to search. OSTI’s Eprint Network allows searching of millions of documents, crawled and indexed from tens of thousands of hand-picked sites, with simultaneous federated search of several dozen large databases. Users search from a single box and view integrated results in a single results page.
Walt Warnick,
Office of Scientific and Technical Information
Dept of Energy
March 30th, 2010 at 4:32 pm
Walt, I’m curious about your interface for combining locally-indexed/crawled and broadcast/federated-searched in a single results set.
Do you not show any results until the fed search results are back?
Okay, wait, your interface is public, I can check it out.
Looks like you show results as soon as the crawled results are back. Then show “X more results available”, constantly updated with higher X’s as they come in, as well as a popup dialog when all results are in. Clicking on “show more results” _does_ re-sort the total results into a merged list.
I’m curious if you have:
a) Any statistics on how often users click “see more results”.
b) Any usability testing on how they like these “more results”, and if it’s a problem for them that it re-sorts their result list when you include them.
March 30th, 2010 at 7:14 pm
Jonathan, I agree there is no obvious reason why ‘long tail’ sources should be searchable by fed search, than be included in an aggregated index.. All I can think of to support this contention (and I don’t necessarily, so I may be arguing for Carl, against my beliefs - not the first time) is the good old “commercial” reason. It is more trouble (cost) to get the fullset of records into the aggregated index, and then keep visiting the sources for updates, than it is to create a Connector, and have it used when needed. depending on how “clever” the aggregated index is, there may be a whole lot of exotic re-formatting, character set handling, semantic re-factoring, etc. to be done to have this “weird and wonderful” source included in a meaningful way. For fed search much of this does not need to apply.
March 30th, 2010 at 7:27 pm
Index + Fed: I also see no reason why this should not be effectively the norm. If there is a local aggregated index, then use it as a source for fed searching like any other is one solution.
If results are displayed as retrieved (most Fed search systems do that now), then the speed of the system (as measured by first screen of results displayed and usable) is actually slightly slower than the *fastest* source not slightly slower than the *slowest* source - a big difference.
Showing the results to hand and the growing tally from laggard sources is the way we have operated from day one. We also have a “more” function, and I can say that it is pretty rarely used. I don’t have stats to hand, but the time it is used most (surprise, surprise) seems to be for “exhaustive” searches when the sources are pretty specialised (as shown by a relatively small number of hits), and the wants to see everything. Then they use multiple times to get a complete set of results. “More” is one of those things which is nice to have when you need it, and good to keep out the way when you don’t.
We don’t automatically re-sort results when “more” arrive - exactly for the reason of display stability. I know Deep Web do, and there are certainly lists of results where it is appropriate - ones which are sorted by some criteria of importance (price) rather than of convenience (alpha by title).
March 31st, 2010 at 6:07 am
Walt - really, the point of my post was that librarians need a discovery tool that does precisely this – integrates federated search within it. That can be done in a variety of ways. Since federated search is always slower and poses significant challenges with being able to sort/dedup/rank the results the same way as those returned from the discovery interface, we at ExL see this as an opportunity to educate the user before handing them the tool. Thus, we currently have each search type (discovery vs. federated search) on separate tabs within the interface. We’re also looking at an option where the discover search results are displayed first, because it delivers results quickly, and then give the user the option to do a federated search on the same terms, but by saying to them “Find the item needed? If not, would you like to search other databases (this will take additional time)? Y/N” If they type Y, then we invoke the federated search, but the important point for us, is that it is all within the same interface. So the user doesn’t have to switch interface, results are presented using the same formatting and they can do the same things with them. But it sets user expectations appropriately and, at the same time, finds more information for them.
March 31st, 2010 at 6:30 am
It’s not obvious to me that it will be more expensive to “regularly harvest into aggregated index” for a given source than to “build and maintain a connector for fed search” for a given source.
But the reverse isn’t obvious to me either! It’s just not clear. Both are kind of expensive, actually! Perhaps now that several aggregated index products are available, and several fed search products with a wide array of connectors — they can be compared, both for price and for coverage. (Since NEITHER possibly covers everything). Would be a great paper for someone to write.