Quality of search results is an important criteria when selecting a federated search engine vendor but it is also a very elusive and subjective characteristic to quantify. What is highly relevant to one person is not so to another person. But, subjectivity aside, there are factors that affect quality of results and I address those in this article.
In previous blog posts I’ve touched on the quality question. I wrote about the trouble with general search engines, that they often don’t provide the results business professionals are looking for. When I reported on the study of undergraduates and federated search at Brigham Young I devoted several paragraphs to addressing the quality of results issue. And, I have asked vendors to provide publicly accessible demo applications that search the same databases so that potential customers can compare quality of search results from different vendors for themselves. While quality of search results is being discussed on the Web, I don’t know what importance those acquiring a federated search solution are giving to quality. I’d be interested to hear from those who have purchased federated search solutions what weight they gave to quality of search results.
Let’s look at the factors that influence quality.
Quality of relevance ranking from the underlying search engine. Garbage in, garbage out. If the underlying search engine gives really poor results to the federated search engine then results from that source are going to be poor. Poor search results from content providers is rampant. Often overlooked in comparing federated search engines is the discussion of what sources they are searching. It is very difficult to compare federated search products if they are all searching different sources. If you are evaluating a federated search product and you don’t like the quality of results you are getting be sure to search the underlying sources directly. The problem may be with the publisher’s search engine and not with the federated search product. To a reasonable extent a high quality federated search product can make up for poor results from a small number of sources if it has a large enough set of results from enough relevant sources but don’t expect magic.
The number of results retrieved from the underlying search engine by the federated search engine. More results are better when it comes to finding the most relevant ones. There are two ways to get more results from a source: ask the source for more results than the default number when you query it, or come back to the source after you’ve gotten some results and ask for more. A particular search engine might rank its own documents poorly and it might only give you 10 of those poor results on an initial search. The federated search engine might be able to construct the initial query in such a way as to get more results up front. If the source doesn’t allow the federated search engine to retrieve 50 or 100 results at a time then it might present a “more results” or “next page” button or link, in which case a sophisticated federated search engine will be able to “virtually click the button” and get more results.
Quality of federated search engine connectors. Read my previous article, What is a Connector, to appreciate the difficult work a connector has to perform. Not all federated search products are created equal and the quality of connectors is paramount to delivering relevant results to the user. Here is an excerpt from that Brigham Young article where I raise the importance of smart connectors:
… a number of factors influence the quality of results from a federated search engine. The one that is most often overlooked is the quality of the connectors that search the databases. I’ve heard stories of federated search users claiming that they actually got better results from a federated search engine searching a particular database than from searching the database directly. How is that possible? It is very possible when connectors are designed to do more effective advanced searches on the user’s behalf than the user would do for himself.
When a user is faced with searching a number of databases directly it becomes tiring to learn the idiosyncrasies of the advanced search form for each database searched. When he or she searches these same databases from the advanced search page of the federated search engine the user only needs to learn one search syntax so the user is more likely to fill that form out correctly. Then, when the connector performs a search, it can translate wild cards, Boolean expression, mapping of fields, phrasing, and other elements of the expression to suit the target database’s search engine.
Basically, a well crafted connector can often retrieve relevant results when a poorly crafted one can’t. The only way to determine the quality of a connector is to perform multiple basic and advanced searches against the federated search engine with only one source selected and then perform those same basic and advanced searches against that source at the publisher’s site. If the source has non-intuitive handling of wild cards, phrase quoting, booleans or other elements of search syntax or semantics than the smart connector will get better results.
Quality of ranking of the federated search engine. This is obviously a key factor. One strategy to relevance ranking is to not do any, i.e. to use the ranking provided by the underlying sources. This is very problematic for two reasons. First, many sources rank poorly. Second, the federated search engine needs to somehow deal with merging of relevance rank information across multiple sources. My experience is that the best way to rank results is to discard what the sources tell you and rank each result against the user query. Since not all sources are created equal and since some organizations would like to give preference to search results from one source over another, all things being equal, I have seen success with ranking approaches that weigh results from some sources higher than from others.
Also worth considering is the ranking algorithm used by the federated search engine. You’re not likely to get many details from any commercial vendor about its proprietary algorithms but you may get some hints that you can use to help you judge the quality of a particular search engine. Or, you may just need to perform multiple searches of varying complexity against single sources and against multiple sources. Then you try to guess what approach the vendor took to ranking results. Ask yourself basic questions: Is the top result more relevant than the tenth? Why? Are your search terms included more often in the title, or snippet in the more highly ranked results?
Results organization and presentation. Perception matters when it comes to presenting search results and perception is influenced by presentation. Plus, poorly organized results will be difficult for users to navigate. A federated search engine that only displayed one search result per page would be very awkward to use even if that first result were superbly relevant. No, I don’t know any federated search engines that do this. Federated search engines that provide clustering and faceted search probably get users to view their results as more relevant. And, combining clustering with relevance ranking makes for even more productive research. I discuss clustering in great detail and touch on faceted search in my clustering article.
Basically, anything that results in a more enjoyable search experience, will lead users to spend more time with a particular federated search product and thus derive value from those highly relevant results, assuming they are easy to find. This is where a pleasant and uncluttered layout, intuitive navigation, and a good amount of Ajax to minimize page refreshes combine with highly relevant search results to create the perfect user experience.
The next time someone asks you to compare a number of federated search products, beyond the look-and-feel, lists of features, price, and other information you’ll be considering, you will now be able to ask the difficult questions about the factors that influence the quality of search results.
Tags: federated search