Apr
Wally Grotophorst, Digital Programs and Systems librarian at George Mason University and customer of this blog’s sponsor Deep Web Technologies, gave the publicly searchable Summon deployment at Dartmouth College a test drive. (Note that while anyone can search and view result meta data, full-text access to documents is restricted to the Dartmouth community. Grotophorst, however, was able to access full-text because his university’s subscriptions overlap with Dartmouth’s.)
I want to highlight a few of Grotophorst’s insights.
Have you heard the old adage “cheap, fast, or good - pick two?” I’ve heard it in the context of building software systems. If you can build something inexpensively, quickly, and with high quality you can’t get all three at the same time. As an example, if you want to build a major system quickly and you want it to work really well then it won’t be cheap. Grotophorst applies this idea of tradeoffs to search.
If you’ve ever looked into digital storage solutions, you’ve probably heard that you can achieve any two of these three attributes: speed, reliability or economy. Build a system that’s fast and reliable and it won’t be inexpensive. Develop a reliable but inexpensive solution and you’ll sacrifice performance. … Web-based searching’s not all that different, you have to balance a set of sometimes conflicting attributes.
Consider the notion of “Just-in-Case.” Build an index of content (e.g. Google), the bigger the better, just in case someone wants what you have.
It’s fast but you are sacrificing currency (reliability) of information. You can retrieve an item only if it was collected and indexed prior to your query. If it just appeared on the web, it’s invisible to you.
Now consider “Just-in-time.” Get the user what he or she wants at search time via federated search:
Make a query and the search engine sends out simultaneous real-time requests to other hosts, bringing back content and presenting it. You’re giving up speed to improve reliability (currency) of information.
I like this view of tradeoffs because it gets us past the pissing match of which approach is better. Do you want speed (discovery service) or currency (federated search)? Your answer is different depending on your users and their needs.
Grotophorst elaborates on what makes Summon compelling and lists some tradeoffs. One big tradeoff caught my eye:
The library moves from an open gateway model to something more closely resembling a walled garden
Here’s an interesting metric to determine “enthusiasm” for Summon.
- If you worry most about helping the user who asks, “find me something useful” then Summon™ is a winner.
- If your job depends on satisfying the user who asks, “find me everything” or “is this absolutely current?” then Summon™ is just a distraction.
Read Grotophorst’s article then have your own experience at the Dartmouth Summon site.
If you enjoyed this post, make sure you subscribe to the RSS feed!
Tags: federated search
4 Responses so far to "Just-In-Case or Just-in-Time?"
April 24th, 2010 at 1:41 pm
I continue to take issue with your implications that broadcast federated search can find ‘everything’. No federated search solution I’ve seen includes connectors for _every_ licensed and free resource that may contain information of interest. It’s probably not possible in general to ever be sure you’ve found “everything”, but if a user wants to find as much as realistically possible, I’m going to send them directly to native interfaces, not a broadcast search solution.
Of course, this will take an awful lot of time to search every resource individually that might possibly have information of interest — likely more time than is available. So what can you do? Pick the top one or three individual native resources (because even if they have connectors to your broadcast search, the quality of the connector may keep you from finding ‘everything’). Then resort to either broadcast federated search or an aggregated index for the rest (with standard google web search being of course one kind of aggregated index).
Which is optimal, broadcast search or aggregated index? Broadcast search will definitely be slower and clumsier. Will it necessarily always be more comprehensive? I have seen that claim made many times on this blog, but I would like to see some evidence of that, comparing the best most comprensive broadcast search tools for a scholarly research field, to the best most comprehensive aggregated index tool for a scholarly research field — and comparing comprehensiveness based on some kind of evidence. Neither an aggregated index nor a broadcast search tool can possibly include “everything” — either will include more the more resources ($$) are put into building it. It is not at all obvious to me that broadcast search _neccesarily_ results in more reliably comprehensive results than an aggregated index, per dollar spent on it.
April 24th, 2010 at 9:15 pm
Jonathan - My issue is that I think libraries should be in the business of providing their users with content sources that are important to their patrons. When libraries just provide their patrons with a discovery service they are abdicating their role of paying attention to what sources matter to their users. Carl Grant has written about this a number of times as have I.
Of course I don’t believe that federated search accesses every source. But here’s what I do believe. If libraries figure out which sources are the most relevant and a discovery service doesn’t include those then do federated search of those sources.
“Comprehensive” is all relative to what your needs are. If a researcher needs to stay current with content from a dozen journals then a federated search system that has connectors to all twelve of those sources is comprehensive for that user.
Discovery services give great convenience (Google speed) but may or may not give users the sources they most need. If I were going to do in depth research - beyond a college term paper - I’d put up with the 30 seconds of waiting if I knew the federated search system included sources that my library staff hand picked because they thought they were relevant. And, if I had the time, sure I would search the native interface of every source.
April 26th, 2010 at 6:11 am
“If libraries figure out which sources are the most relevant and a discovery service doesn’t include those then do federated search of those sources.”
I can respect that for sure, but that second clause is an “if” that has not yet been answered, I think.
And there’s another “if” of course — if the federated search can include all resources librarians determine are most relevant (or more than the discovery service).
After all, in my support of our federated search product, there are sometimes resources that librarians determine are most relevant that are not supported by our federated search product either. With enough time and/or money they probably could be — but that’s true of an aggregated index discovery service too, right? Unless the rights holder refuses to make it available through that venue — which is possible for federated search too.
So realistically, it’s possible that some ‘most important’ resources won’t be covered by broadcast federated search, and it’s possible that some ‘most important’ resources won’t be covered by aggregated index too. I’m not comfortable making the assumption that any ‘good’ federated search solution will obviously cover more ‘most important’ resources than the a similarly priced ‘good’ aggregated index product. I think we need some investigation to see if this is true of actually existing products.
April 26th, 2010 at 6:22 am
Btu there’s actually a larger issue you bring up, which deserves some more thought too.
“When libraries just provide their patrons with a discovery service they are abdicating their role of paying attention to what sources matter to their users. ”
This is an important point, which I’m sympathetic to, and which deserves some attention.
One thing we need to do is separate practical reality from fantasy though. What level of selection do librarians actually exersize over real-world broadcast federated search in the academic/scholarly research field? Usually we can pick and choose from among resources that _themselves_ aggregate thousands of journal articles, from one or many publishers. We get to pick the ‘resources’, but the resources themselves have already picked for us what they’ll contain — have we abdicated our selection responsibilities to them?
Well, kind of, yes. Is this unfortunate? Quite possibly. Realistically though, we probably don’t have the time to choose baskets of resources on an individual article, or even individual title, level in the increasingly vast field.
Now, interestingly, an aggregated index product could theoretically allow us to choose a subset of records to be searched in a given search too. I know that both vendors I know of offering such products for the academic market have talked about offering such an option, but I don’t know if either one does yet. I’m also not sure if actual customers would choose to use this option if it did exist — federated search forces us to make the choices whether we want to or not (which may be a good thing?), aggregated index, in theory, can offer more options — make the choices, or take the ‘whole universe’ instead.
In fact, in theory, aggregated index could give _more_ flexible choices than broadcast search — it could allow libraries to set up search collections that split platform-vendor provided collections down the middle, including _exactly_ the scholarly titles (or even individual articles) chosen by a librarian, instead of having to take the entire vendor-provided package as a unit as in broadcast federated search. In theory this is possible — but would any library customers choose to make use of it were it available, would they actually have the time/resources to make such selections? I’m not sure.
With or without any kind of “cross search” (broadcast federated or aggregated index discovery), we have already kind of abdicated selection in the electronic realm where we buy resources by the ‘package’ instead of the individual title. There are a variety of reasons this has happened, most related to economics (both our economics and resources, and the economics and business models of our vendors). It may indeed be regrettable, but it’s true whether you’ve got fed search, aggregated index, both, or neither.