6
Jun

I believe that when organizations, their management, and their users become disappointed or disillusioned with federated search it is because they have unreasonable expectations of what this technology can do. And, vendors don’t do enough to replace these expectations with reasonable ones. The reality is that using a federated search system is an exercise in making compromises. Federated search provides benefits, but it does so at a cost. Those who understand the trade-offs will have a better experience than those who don’t.

Here’s my list, in no particular order, of ten unrealistic expectations and what you should expect.

  1. Federated search is simple to deploy and configure, and it requires no maintenance. Federated search is complex software. Except for hosted solutions, expect a system administrator to spend time installing software, upgrades, and patches, not to mention spending time studying system requirements, installations steps, and performance considerations. The software work may be related to the vendor’s product, to the server’s operating system, or to the web server. Problems may also crop up related to storage, user load, or network traffic.

  2. It’s easy to create and maintain my own connectors. Building connectors, assuming the vendor even supports this functionality, is complicated and time consuming, especially if you want the connector to work well with fielded search and with the quirky syntax of some sources. You’ll have to learn the connector language, which will have a steep learning curve if it supports sophisticated features. You may have to deal with authentication, sessions, cookies and other surprises. Plus, you’ll need a way to monitor your sources, if the vendor doesn’t provide a tool for this, so that you can rewrite connectors when a content provider suddenly changes its search interface, which happens more often than you think. Read What is a connector? to gain an appreciation for the hard work that connectors do.

  3. All federated search vendors can (or want to) build connectors for all sources. Not true. I’ve heard horror stories about customers who have purchased federated search systems from vendors, expecting connectors to be built for all of the sources they cared about and the vendor couldn’t deliver. Some connectors are notoriously difficult to build. Other connectors don’t get built because the vendor doesn’t have a market for the connector, other than you. If there are sources that are critical to your solution, have a signed document that the deal is contingent on those all-important connectors being built. Better yet, have the vendor build and demonstrate successful operation of the sources prior to signing a contract.

  4. All sources should always be available. Sources are the weak link in a federated search system. When a content provider changes its search interface, its query language, or its URL, the connector to that source stops working. Searches from the federated search application return no results from that source. The best that the federated search vendor can do is to actively monitor all connectors and rewrite them as quickly as possible when the source changes require it.

  5. Federated search should be as fast as Google. This is not possible because the whole point of federated search is to search sources in real time. Searches need to be processed by remote search engines. This does not happen instantaneously. Some sources, in particular are slow to return their results.

  6. Federated search should be a commodity that’s quick to procure. Federated search systems take a long time to specify, evaluate, and procure. It’s not uncommon for the sale cycle to take a year or longer. Vendors have different features and there are many factors to consider that make for a long process.

  7. Federated search should be as good as searching the native sources. The librarians are right. If users would take the time to learn the query syntaxes and quirky behavior of every source they care about and if they are willing to perform advanced searches against each source, and manually deduplicate the results, then they will get more relevant results. But, most users are not interested in doing this.

  8. Federated search ranks perfectly. Federated search has very limited information with which to perform its ranking. Federated search systems do the best they can. They typically use document title, author, and snippet or abstract to perform their ranking. The native sources have access to the full text of their articles to use for ranking so they can perform better ranking. Some federated search systems don’t perform ranking at all, or they just return documents in the order ranked by the source.

  9. Federated search deduplicates perfectly. Regardless of what anyone tells you, deduplication is a very difficult problem. When multiple sources return essentially the same document, the federated search engine may or may not be able to identify the two as duplicates of one another. Since the two documents come from different sources they will have different URLs so deduplicating by URL is not sufficient. Using the title is risky, especially for documents having very general titles. Federated search systems have to use a combination of fields to try to detect matches. It is far from a perfect science.

  10. If one of a vendor’s products works well in your environment, their federated search system will as well. It’s natural to want to buy various pieces of software from one vendor. Some vendors sell different types of products to libraries in particular. One of the products might be very well suited, perhaps an ILS, or a link resolver, but that doesn’t mean that the federated search system will work well for a customer.

WebFeat has their own list, The Truth About Federated Searching, that tells five myths about federated search. While their list is five years old, the information applies today.

What unrealistic expectations would you add to the list?

If you enjoyed this post, make sure you subscribe to the RSS feed!

Tags:

This entry was posted on Friday, June 6th, 2008 at 8:32 am and is filed under viewpoints. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

3 Responses so far to "Federated search: 10 unrealistic expectations"

  1. 1 Bill Dueber
    June 6th, 2008 at 8:54 am  

    You write about speed: “This is not possible because the whole point of federated search is to search sources in real time. Searches need to be processed by remote search engines.”

    This, I think, is a red herring. It’s not that we want to do “realtime” searching — it’s that vendors can’t or won’t release their indexes for inclusion in a larger, faster, single-index search. Most of the issues surrounding federated search would disappear if what we purchased was (a) a regular (weekly?) update to the index in a standard format and (b) access to the articles based on data supplied with the index.

    But the process of searching four or five or eight different vendors (each with their own syntax, which changes over time), waiting for them all to return, then trying to do some sort of intelligent merge — well, it’s no wonder it’s ridiculously slow at times.

  2. 2 Lukas Koster
    June 7th, 2008 at 1:46 am  

    Another unrealistic expectation: a federated search in databases with different native languages will present all relevant records from those databases.

    Another one: a search on author names will find all titles by these authors (not true because of differences in spelling, formats, name variations, languages, etc.)

    One more: searching on specific subjects will result in the same records from different sources (not true because of differences in classification).

    Solutions would be in maintaining virtual global authority files.

  3. 3 Federated search: ten unrealistic expectations & an idea « Infonatives
    June 8th, 2008 at 2:54 am  

    [...] search: ten unrealistic expectations & an idea This article on the most excellent Federated search blog outlines some of the unrealistic expectations that [...]

Leave a reply

Name (*)
Mail (*)
URI
Comment