Mar
The first thing that most people notice when they use a federated search application is that it’s not nearly as fast as Google. We’ve all gotten spoiled. This is not only the information age, it’s the age of quick information; we all want every search to be as fast as a Google search. However, by its very nature, federated search can’t be as fast as Google. Federated search is at the mercy of the sources it federates. If a source is slow to return results to the federated search application, then there’s nothing the federated search application can do, or is there?
Deep Web Technologies has been displaying incremental results for some time now. The idea is simple: display results in chunks as they are received from the sources being searched. Science.gov, WorldWideScience.org, and Scitopia.org are three applications that display incremental results. While there are challenges to this approach, there are some significant benefits as well. The aim of displaying incremental results is to minimize the time the user has to wait to see some results. In the show-something-quick department, incremental results works well. The major challenge arises when you try to figure out what to do with the rest of the results as they come in.
Before we look at the challenges of incremental results let’s look at the motivation for doing this at all. In an ideal world, all sources return results quickly, users are willing to wait for all results from all sources to return, and there’s no need for incremental results. Some federated search applications do have a majority of sources returning results quickly. The owner of the application can decide to wait a small number of seconds, perhaps five seconds, display the results that have come in during that time, and be done. Other federated search applications search a number of slow sources. In this case, the application would have to wait 25 or 30 seconds, and even then, not all sources would return their results. The application could wait 30 seconds, time out the sources that didn’t return results in 30 seconds, then display the results that did return.
It is the existence of slow sources that motivates the desire to implement incremental results. Have you ever heard of the Disneyland phenomenon? At Disneyland, you might have to wait an hour in line to get on a ride but the line is always moving so the wait doesn’t feel like an hour. Incremental results serves the same purpose, give the user something to do (i.e. give the user some results to look at) while he’s waiting to get on the ride (i.e. see all of the results). The choices for handling slow sources are these: show incremental results, make the user wait, or show partial results.
The challenges with incremental results center around usability. How do you update the results list in a way that makes sense to the user? Early in the design of the incremental results feature, Deep Web realized that redisplaying the user’s search results without permission from the user was a very bad idea. I know that when I am viewing a result list I do not want it to change on me unless I click on something to make that happen. So, I’m looking at the first set of results that have come in from the fast sources. I can click on results, view documents, and return to the results list. Some number of seconds later more results are available.
There are two approaches to interacting with the user regarding the new results; (1) provide the user with a button to click on the results page to view additional results, or (2) provide the user with a dialog box, letting him know that there are more results, and asking him to decide whether to add the new results or not. Science.gov, as an example, utilizes both approaches.
Handling incremental results raises usability issues no matter how the federated search application interacts with the user. Some users don’t want their search experience to be interrupted by a dialog box. Other users, if the dialog box doesn’t appear, won’t realize that there are more results available. Science.gov shows the dialog box at the end of the search, when all results have been received (and perhaps some sources have timed out).
A more challenging consideration is that when newer results are merged in with the early results, reranking of all results will occur. This confuses users. The newer results are not simply added to the top or bottom of the results list as some users would hope or expect. Some of the newer results may be more relevant than the early results and the user would want to know that. Plus, it is likely that many casual users don’t understand what ranking is.
To illustrate a concern with updating of results, one customer complained, and rightfully so, that because their search engine was slower than that of some of the other sources in the application, that initial results were presented from one of the faster sources, even though they were the authority on the subject matter from particular queries. Their concern was that users would not not know to request more results after the first burst of results and would never see the most relevant results. This concern was addressed by experimenting with the amount of time the application waits before displaying even the first results. The longer that initial delay the more sources get to participate in the first set of results but also the longer users have to wait to see any results.
Usability concerns can be overcome by training, documentation, and education. I suspect that experienced users don’t get confused about what is going on with incremental results. Online tutorials and help pages can go a long way to getting users comfortable with incremental results, if users are willing to take the time to educate themselves.
The message with incremental results and usability is that everywhere you look there is a trade-off. Everything is simpler for the application and for the user if you don’t display incremental results. Yet, users want speed. For all the challenges, I believe incremental results improve the overall user experience.
Do any of you have experience with software that delivers incremental results? Have you used Science.gov or any of the other Deep Web applications that display results incrementally? What’s your impression? Does it work well? How would you improve it?
Tags: Deep Web Technologies, DOE, federated search, OSTI, Science.gov, scitopia.org, worldwidescience.org
3 Responses so far to "Federated search: the challenges of incremental results"
March 28th, 2008 at 11:31 am
Hi,
I’m running myself a European metasearch engine called eTools.ch and was evaluating the possibility of showing incremental results myself. Because there were too many usability issues (e.g. partly merged and ranked results), I decided to try out a user-selectable max. timeout: if a data source does not complete within the max. permitted time (two seconds by default), the results from this source will not/partly shown. A subsequent query with the same search term will most likely return the results from the not completed source, because it was cached in the meantime. Like that, I need less than 1.3 seconds in average to deliver the results from 13 data sources. Anyway, responsive data sources are a must for a federated search solution (have a look at the real-time statistics at http://www.etools.ch/searchInfo.do?r#status).
Since the underlaying framework is very flexible, I can choose the best approach for each project individually.
Greetings from Switzerland,
Stephan
March 29th, 2008 at 7:38 pm
Relying on the user to do something is to set the user up for failure. I’m trying to remember where I read it, but in a book I was going through recently the author suggested to managers that if a software developer starts a sentence with “If we could only get the user to…” then the manager should send the developer back to the drawing board. (I think it was one of the Clayton Christensen Innovator’s Dilemma books.)
March 23rd, 2009 at 9:35 am
i need to implement incremental results for my application. can anyone guide me in how to do that in my script.