10
Feb

Quality of search results is an important criteria when selecting a federated search engine vendor but it is also a very elusive and subjective characteristic to quantify. What is highly relevant to one person is not so to another person. But, subjectivity aside, there are factors that affect quality of results and I address those in this article.

In previous blog posts I’ve touched on the quality question. I wrote about the trouble with general search engines, that they often don’t provide the results business professionals are looking for. When I reported on the study of undergraduates and federated search at Brigham Young I devoted several paragraphs to addressing the quality of results issue. And, I have asked vendors to provide publicly accessible demo applications that search the same databases so that potential customers can compare quality of search results from different vendors for themselves. While quality of search results is being discussed on the Web, I don’t know what importance those acquiring a federated search solution are giving to quality. I’d be interested to hear from those who have purchased federated search solutions what weight they gave to quality of search results.

Let’s look at the factors that influence quality.

Quality of relevance ranking from the underlying search engine. Garbage in, garbage out. If the underlying search engine gives really poor results to the federated search engine then results from that source are going to be poor. Poor search results from content providers is rampant. Often overlooked in comparing federated search engines is the discussion of what sources they are searching. It is very difficult to compare federated search products if they are all searching different sources. If you are evaluating a federated search product and you don’t like the quality of results you are getting be sure to search the underlying sources directly. The problem may be with the publisher’s search engine and not with the federated search product. To a reasonable extent a high quality federated search product can make up for poor results from a small number of sources if it has a large enough set of results from enough relevant sources but don’t expect magic.

The number of results retrieved from the underlying search engine by the federated search engine. More results are better when it comes to finding the most relevant ones. There are two ways to get more results from a source: ask the source for more results than the default number when you query it, or come back to the source after you’ve gotten some results and ask for more. A particular search engine might rank its own documents poorly and it might only give you 10 of those poor results on an initial search. The federated search engine might be able to construct the initial query in such a way as to get more results up front. If the source doesn’t allow the federated search engine to retrieve 50 or 100 results at a time then it might present a “more results” or “next page” button or link, in which case a sophisticated federated search engine will be able to “virtually click the button” and get more results.

Quality of federated search engine connectors. Read my previous article, What is a Connector, to appreciate the difficult work a connector has to perform. Not all federated search products are created equal and the quality of connectors is paramount to delivering relevant results to the user. Here is an excerpt from that Brigham Young article where I raise the importance of smart connectors:

… a number of factors influence the quality of results from a federated search engine. The one that is most often overlooked is the quality of the connectors that search the databases. I’ve heard stories of federated search users claiming that they actually got better results from a federated search engine searching a particular database than from searching the database directly. How is that possible? It is very possible when connectors are designed to do more effective advanced searches on the user’s behalf than the user would do for himself.

When a user is faced with searching a number of databases directly it becomes tiring to learn the idiosyncrasies of the advanced search form for each database searched. When he or she searches these same databases from the advanced search page of the federated search engine the user only needs to learn one search syntax so the user is more likely to fill that form out correctly. Then, when the connector performs a search, it can translate wild cards, Boolean expression, mapping of fields, phrasing, and other elements of the expression to suit the target database’s search engine.

Basically, a well crafted connector can often retrieve relevant results when a poorly crafted one can’t. The only way to determine the quality of a connector is to perform multiple basic and advanced searches against the federated search engine with only one source selected and then perform those same basic and advanced searches against that source at the publisher’s site. If the source has non-intuitive handling of wild cards, phrase quoting, booleans or other elements of search syntax or semantics than the smart connector will get better results.

Quality of ranking of the federated search engine. This is obviously a key factor. One strategy to relevance ranking is to not do any, i.e. to use the ranking provided by the underlying sources. This is very problematic for two reasons. First, many sources rank poorly. Second, the federated search engine needs to somehow deal with merging of relevance rank information across multiple sources. My experience is that the best way to rank results is to discard what the sources tell you and rank each result against the user query. Since not all sources are created equal and since some organizations would like to give preference to search results from one source over another, all things being equal, I have seen success with ranking approaches that weigh results from some sources higher than from others.

Also worth considering is the ranking algorithm used by the federated search engine. You’re not likely to get many details from any commercial vendor about its proprietary algorithms but you may get some hints that you can use to help you judge the quality of a particular search engine. Or, you may just need to perform multiple searches of varying complexity against single sources and against multiple sources. Then you try to guess what approach the vendor took to ranking results. Ask yourself basic questions: Is the top result more relevant than the tenth? Why? Are your search terms included more often in the title, or snippet in the more highly ranked results?

Results organization and presentation. Perception matters when it comes to presenting search results and perception is influenced by presentation. Plus, poorly organized results will be difficult for users to navigate. A federated search engine that only displayed one search result per page would be very awkward to use even if that first result were superbly relevant. No, I don’t know any federated search engines that do this. Federated search engines that provide clustering and faceted search probably get users to view their results as more relevant. And, combining clustering with relevance ranking makes for even more productive research. I discuss clustering in great detail and touch on faceted search in my clustering article.

Basically, anything that results in a more enjoyable search experience, will lead users to spend more time with a particular federated search product and thus derive value from those highly relevant results, assuming they are easy to find. This is where a pleasant and uncluttered layout, intuitive navigation, and a good amount of Ajax to minimize page refreshes combine with highly relevant search results to create the perfect user experience.

The next time someone asks you to compare a number of federated search products, beyond the look-and-feel, lists of features, price, and other information you’ll be considering, you will now be able to ask the difficult questions about the factors that influence the quality of search results.

If you enjoyed this post, make sure you subscribe to the RSS feed!

Tags:

This entry was posted on Sunday, February 10th, 2008 at 9:31 pm and is filed under basics. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

6 Responses so far to "What determines quality of search results?"

  1. 1 quality of federated search? « Bibliographic Wilderness
    February 11th, 2008 at 10:49 am  

    [...] from someone who works for a vendor looking to get into the library federated search market?), is a great post that explains what goes into the quality of a federated search product’s search, and what [...]

  2. 2 A Passion For ‘Puters » Blog Archive » Carnival of the Infosciences - #89
    February 18th, 2008 at 6:11 am  

    [...] carnival by pointing to a post about determining quality of results for federated search engines at “What determines quality of search results?” on his blog. It’s a slightly different look at federated search results and in it he [...]

  3. 3 First Assignment-new federated search tool « Technolust & Loathing
    February 16th, 2009 at 2:24 pm  

    [...] word on the matter, but it will need to be a reasonable solution.  I found a list called “What determines quality of search results?” which has been very useful.  I have also started to look around the web and so far here [...]

  4. 4 Federated Search Engine Tips and Tricks « Major Metapolitan
    January 24th, 2010 at 10:01 pm  

    [...] blogicle provides tips for getting the best quality results from a federated search engine. (I admittedly [...]

  5. 5 Redeeming Qualities « Talkin’ ‘Bout the Information
    February 1st, 2010 at 5:59 pm  

    [...] Quality results don’t happen by accident, and this article outlines the things that a federated search engine has to include in order to produce quality results: Sol. “What determines quality of search results?” Federated Search Blog. February 10, 2008. http://federatedsearchblog.com/2008/02/10/what-determines-quality-of-search-results/ [...]

  6. 6 The Smarter Catalog, Part Duex | MetaMonster
    February 6th, 2011 at 9:36 am  

    [...] OPAC experience. A smarter catalog is something I am very interested in, and learning more about how federated searching might be structured to provide more quality search results across multiple databases, and how the metasearch provided in Google Scholar works, I have a [...]

Leave a reply

Name (*)
Mail (*)
URI
Comment