[ Editor’s note: This post is from Abe. Yes, Abe and I sometimes talk about federated search over dinner. Sad, isn’t it? ]
Friday night I was having dinner with Sol and somehow the conversation turned to federated search. Specifically, we got into a discussion on content providers who return results that are not very useful to a federated search engine. This is a special problem for federated search engines such as our Explorit engine that puts significant emphasis on relevance ranking of search results. If a content provider’s search engine returns results that are unranked or poorly ranked or returns results which provide little information to rank on (e.g. only a short title is returned and no snippet is returned) then the results returned by that content provider will be relegated to the back of the result list and users will not see these results.
One of our major customers chose not to include in their federated search a source from a major content publisher who shall remain nameless because of this very reason. We will be working with this publisher to help improve the quality of the search results their source returns so that it can become a useful contributor to our results list.
My conversation with Sol reminded me of a talk that I gave at the annual conference of the Society for Scholarly Publishing back in 2006. My talk was titled – Leveraging Publisher’s Search Engines to Deliver Relevant Results to Users. Half way through my presentation, right before some screen shots, I have a slide titled – Advice for Publishers containing the following bullets:
- Use good search engines with good relevance ranking
- Return a 100 or more results at a time
- Return meta-data (author, journal, snippet) as part of result list
- Provide access to your content through XML Gateway or Web Services
- Speed up search time
When we work closely with the owners of content such as is the case with the Science.gov and Scitopia.org portals then issues of how well a source is represented in the first page or two of results, particularly for queries which the content provider feels his content should have really good results, become significant.
I’m reminded of one provider of very technical content where titles tended to be on the long side. This caused a problem as Explorit weighs results with short titles containing search terms higher than longer titles containing the search terms. I had to explain to the content provider why their content wasn’t ranking as high as expected and although we could have assigned a boost to the ranking of results returned by this source we chose not to do that.
Interestingly enough we also have the opposite problem of sources that return results that are too good compared to the results returned by most of the other sources. For example, a source that returns context-sensitive snippets, i.e. snippets of text containing search terms, will rank higher than sources that only return the first few sentences of a document or return an abstract. It’s not the case that a result returned with a context-sensitive snippet is necessarily more relevant than a result that only returns an abstract or doesn’t return a snippet at all. Explorit just doesn’t know.
It seems like one just can’t win. Some sources return results that are poor and don’t rank well. Some sources return results that are really good and make the rest of the results look bad.
Tags: federated search