<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Is federated search &#8220;ranking impaired?&#8221;</title>
	<atom:link href="http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/feed/" rel="self" type="application/rss+xml" />
	<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/</link>
	<description>Covers topics related to federated search and the deep web</description>
	<pubDate>Wed, 17 Mar 2010 07:18:55 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Why users like federated search (even though they shouldn&#8217;t) &#171; Dana&#8217;s user experience blog</title>
		<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/comment-page-1/#comment-34909</link>
		<dc:creator>Why users like federated search (even though they shouldn&#8217;t) &#171; Dana&#8217;s user experience blog</dc:creator>
		<pubDate>Thu, 05 Nov 2009 06:29:16 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/#comment-34909</guid>
		<description>[...] relevance ranking doesn&#8217;t really work. Because federated search is pulling in material from a range of sources, each of which use [...]</description>
		<content:encoded><![CDATA[<p>[...] relevance ranking doesn&#8217;t really work. Because federated search is pulling in material from a range of sources, each of which use [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/comment-page-1/#comment-1543</link>
		<dc:creator>Dave</dc:creator>
		<pubDate>Mon, 05 May 2008 23:01:56 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/#comment-1543</guid>
		<description>In medicine there is a ranking system that can be applied to search results.  It is based on quality of evidence.  See http://www.cebm.net/index.aspx?o=1025 for more.  At the University of Arizona, we adapted this system to search results in an inhouse developed tool called EBM Search.  It harnesses the the search capabilities available from the targeted databases, and displays results according to publication type (which is tied to evidence quality).  The results so far have been successful:  we surveyed medical students over two years and they both use it and comment favorably on it.   It is the default search tool in their courseware.  Your PubMed "cancer" example is a great one - do a search on cancer in EBM Search, and you find systematic reviews of randomized controlled trials, highest in rank, followed by clinical trials, all aggregated and displayed according to a ranking system that makes sense to the clinical user.   It seems that federated search is too concerned with finding all things for all groups, emulating Google.  I would argue the best approach is to work with specific target groups and learn their information-seeking behaviors, then customize something for them specifically, including a ranking system based on that user group's culture, if possible.</description>
		<content:encoded><![CDATA[<p>In medicine there is a ranking system that can be applied to search results.  It is based on quality of evidence.  See <a href="http://www.cebm.net/index.aspx?o=1025" rel="nofollow">http://www.cebm.net/index.aspx?o=1025</a> for more.  At the University of Arizona, we adapted this system to search results in an inhouse developed tool called EBM Search.  It harnesses the the search capabilities available from the targeted databases, and displays results according to publication type (which is tied to evidence quality).  The results so far have been successful:  we surveyed medical students over two years and they both use it and comment favorably on it.   It is the default search tool in their courseware.  Your PubMed &#8220;cancer&#8221; example is a great one - do a search on cancer in EBM Search, and you find systematic reviews of randomized controlled trials, highest in rank, followed by clinical trials, all aggregated and displayed according to a ranking system that makes sense to the clinical user.   It seems that federated search is too concerned with finding all things for all groups, emulating Google.  I would argue the best approach is to work with specific target groups and learn their information-seeking behaviors, then customize something for them specifically, including a ranking system based on that user group&#8217;s culture, if possible.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Abe</title>
		<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/comment-page-1/#comment-1047</link>
		<dc:creator>Abe</dc:creator>
		<pubDate>Wed, 26 Mar 2008 16:47:04 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/#comment-1047</guid>
		<description>I’m glad to see that this topic of “relevance ranking” of federated search results has sparked some debate in this blog. It is an area that has been of great interest to me for 5 years now and my company has invested a lot of resources to address this challenge.

First of all I agree with the comments/observations that federated search operates with incomplete, “regurgitated” information as Tom points out in an earlier comment.

The problem as discussed by Sol and commenters to his post is two-fold: first a federated search engine may only bring back and analyze/rank a small subset of available results from a large information source which has many results for a given query. Note, that this is more of a problem with the user’s query than a federated search problem. For example if a user goes to PubMed (U.S. Government’s most popular database which also happens to return its results in chronological order) and searches for “cancer” the search is not going to bring back very useful results.

A federated search engine, ours included, relies to a great extent on the relevance ranking capabilities of the information sources being queried. We at Deep Web Technologies do a number of things to significantly improve the chances that we’ll bring back and find the most relevant documents that the user is searching for. We ensure that each of the connectors that we create is optimized (supports all search operators of the source, and supports advanced fielded search capabilities) for the information source that is searched and we bring back a larger number of results (at least 100 where possible) from each source being federated.

The second challenge of federated search is ranking of the results that have been brought back. How does the federated search engine know that the first result returned by one information source is more relevant than the fifth result returned by another information source? Almost 5 years ago I implemented the first of our relevance ranking algorithms, QuickRank, to an initially skeptical group of my customers. QuickRank which ranks results based on the occurrence of search terms within titles and snippets within a result has proven to work extremely well. No, it doesn’t ensure that the most relevant results are always returned to a user but much more often than not the best results are found and returned within the first page of results.

Results which might be highly relevant but don’t include the search terms in a title, author or snippet are returned as unranked results. Note that Google suffers from a similar type of problem in that a highly relevant web page, a “gem”, that hasn’t yet been discovered (i.e. doesn’t have a many links to it) is not likely to be found by someone searching Google.

Finally, as I discussed in &lt;a href="http://federatedsearchblog.com/2008/02/25/not-all-federated-search-engines-are-created-equal/" rel="nofollow"&gt;one of my earlier posts&lt;/a&gt;, federated search is a great discovery tool for students and researchers who don’t know or don’t want to know where to search for information. Yes, if one could create one very large index of all the information that one might be interested in searching for, and have it indexed by a highly capable search engine, then this option would be preferable to federated search, but since this is not possible and is going to become less possible as more and more information sources become available what alternative is there to federated search?</description>
		<content:encoded><![CDATA[<p>I’m glad to see that this topic of “relevance ranking” of federated search results has sparked some debate in this blog. It is an area that has been of great interest to me for 5 years now and my company has invested a lot of resources to address this challenge.</p>
<p>First of all I agree with the comments/observations that federated search operates with incomplete, “regurgitated” information as Tom points out in an earlier comment.</p>
<p>The problem as discussed by Sol and commenters to his post is two-fold: first a federated search engine may only bring back and analyze/rank a small subset of available results from a large information source which has many results for a given query. Note, that this is more of a problem with the user’s query than a federated search problem. For example if a user goes to PubMed (U.S. Government’s most popular database which also happens to return its results in chronological order) and searches for “cancer” the search is not going to bring back very useful results.</p>
<p>A federated search engine, ours included, relies to a great extent on the relevance ranking capabilities of the information sources being queried. We at Deep Web Technologies do a number of things to significantly improve the chances that we’ll bring back and find the most relevant documents that the user is searching for. We ensure that each of the connectors that we create is optimized (supports all search operators of the source, and supports advanced fielded search capabilities) for the information source that is searched and we bring back a larger number of results (at least 100 where possible) from each source being federated.</p>
<p>The second challenge of federated search is ranking of the results that have been brought back. How does the federated search engine know that the first result returned by one information source is more relevant than the fifth result returned by another information source? Almost 5 years ago I implemented the first of our relevance ranking algorithms, QuickRank, to an initially skeptical group of my customers. QuickRank which ranks results based on the occurrence of search terms within titles and snippets within a result has proven to work extremely well. No, it doesn’t ensure that the most relevant results are always returned to a user but much more often than not the best results are found and returned within the first page of results.</p>
<p>Results which might be highly relevant but don’t include the search terms in a title, author or snippet are returned as unranked results. Note that Google suffers from a similar type of problem in that a highly relevant web page, a “gem”, that hasn’t yet been discovered (i.e. doesn’t have a many links to it) is not likely to be found by someone searching Google.</p>
<p>Finally, as I discussed in <a href="http://federatedsearchblog.com/2008/02/25/not-all-federated-search-engines-are-created-equal/" rel="nofollow">one of my earlier posts</a>, federated search is a great discovery tool for students and researchers who don’t know or don’t want to know where to search for information. Yes, if one could create one very large index of all the information that one might be interested in searching for, and have it indexed by a highly capable search engine, then this option would be preferable to federated search, but since this is not possible and is going to become less possible as more and more information sources become available what alternative is there to federated search?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom</title>
		<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/comment-page-1/#comment-1029</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Tue, 25 Mar 2008 18:04:13 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/#comment-1029</guid>
		<description>Since I'm ultimately the culprit here, and Toni clued me into this discussion, perhaps I ought to contribute a few thoughts.  

Both Sol and Peter provide caveats that I agree with, but I will state here as I did in the presentation (although using different words) that a federated search engine in the context in which I was talking is performing whatever it does on regurgitated data from the databases it points to.  If this data has been already ranked by the pointed to database, that will be the order in which the federated search engine receives (and in most cases) displays to the user.  If additional processing is done on that data to provide some different type of relevancy ranking, the federated search engine is potentially operating with incomplete information (part of Sol's point).  If the federated search engine is sending the user's search to several databases simultaneously, the outcome is even more problematic because: 
1.  The federated search engine may not be dealing with a complete result set from the database searched (Peter's point); and
2.  Each database may be performing relevancy ranking using different criteria for relevance.  

It is not so much that it is impossible to conceptually design a metasearch engine that scours all of the results from the target databases and other resources and itself applies a relevancy algorithm to (potentially) re-rank the new uber-result set as it is extremely difficult to operationalize the concept given the constraints mentioned above and the fact that users want an immediate response.  

Further and perhaps more to the point, the audience for this presentation was people working in a library context in which mega-bucks are spent on licensing access to non-crawlable resources, the breadth and depth of published literature.  We may be talking about two very different types of federated searching targets: a) those which can be harvested in toto; and b) those which cannot.</description>
		<content:encoded><![CDATA[<p>Since I&#8217;m ultimately the culprit here, and Toni clued me into this discussion, perhaps I ought to contribute a few thoughts.  </p>
<p>Both Sol and Peter provide caveats that I agree with, but I will state here as I did in the presentation (although using different words) that a federated search engine in the context in which I was talking is performing whatever it does on regurgitated data from the databases it points to.  If this data has been already ranked by the pointed to database, that will be the order in which the federated search engine receives (and in most cases) displays to the user.  If additional processing is done on that data to provide some different type of relevancy ranking, the federated search engine is potentially operating with incomplete information (part of Sol&#8217;s point).  If the federated search engine is sending the user&#8217;s search to several databases simultaneously, the outcome is even more problematic because:<br />
1.  The federated search engine may not be dealing with a complete result set from the database searched (Peter&#8217;s point); and<br />
2.  Each database may be performing relevancy ranking using different criteria for relevance.  </p>
<p>It is not so much that it is impossible to conceptually design a metasearch engine that scours all of the results from the target databases and other resources and itself applies a relevancy algorithm to (potentially) re-rank the new uber-result set as it is extremely difficult to operationalize the concept given the constraints mentioned above and the fact that users want an immediate response.  </p>
<p>Further and perhaps more to the point, the audience for this presentation was people working in a library context in which mega-bucks are spent on licensing access to non-crawlable resources, the breadth and depth of published literature.  We may be talking about two very different types of federated searching targets: a) those which can be harvested in toto; and b) those which cannot.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Toni</title>
		<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/comment-page-1/#comment-1027</link>
		<dc:creator>Toni</dc:creator>
		<pubDate>Tue, 25 Mar 2008 15:58:42 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/#comment-1027</guid>
		<description>That's an interesting thought, Peter.

The way I understand it, the individual database results may be ranked, but there are no standards to allow federated search engines to re-rank that information.</description>
		<content:encoded><![CDATA[<p>That&#8217;s an interesting thought, Peter.</p>
<p>The way I understand it, the individual database results may be ranked, but there are no standards to allow federated search engines to re-rank that information.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Murray</title>
		<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/comment-page-1/#comment-1014</link>
		<dc:creator>Peter Murray</dc:creator>
		<pubDate>Tue, 25 Mar 2008 00:59:33 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/#comment-1014</guid>
		<description>I believe it is also true that a federated search tool has incomplete knowledge of all of the records from the databases being searched.  For instance, if a target database has 10,000 hits that match a search phrase, does the federated search tool get all 10,000 hits?  In my experience, some target databases only give you the first 100 or so.  If that is the case, then the federated search engine would be relying on the fact that the 100 most relevant hits were returned first by the target database.

Since the federated search engine can't see all 10,000 hits, the usability of relevance ranking is further impaired.</description>
		<content:encoded><![CDATA[<p>I believe it is also true that a federated search tool has incomplete knowledge of all of the records from the databases being searched.  For instance, if a target database has 10,000 hits that match a search phrase, does the federated search tool get all 10,000 hits?  In my experience, some target databases only give you the first 100 or so.  If that is the case, then the federated search engine would be relying on the fact that the 100 most relevant hits were returned first by the target database.</p>
<p>Since the federated search engine can&#8217;t see all 10,000 hits, the usability of relevance ranking is further impaired.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Toni</title>
		<link>http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/comment-page-1/#comment-964</link>
		<dc:creator>Toni</dc:creator>
		<pubDate>Sat, 22 Mar 2008 15:04:50 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/03/21/is-federated-search-ranking-impaired/#comment-964</guid>
		<description>Pardon me for misinterpreting Mr. Wilson's remarks. He did not use the phrase "by nature". I may have poorly paraphrased his point.</description>
		<content:encoded><![CDATA[<p>Pardon me for misinterpreting Mr. Wilson&#8217;s remarks. He did not use the phrase &#8220;by nature&#8221;. I may have poorly paraphrased his point.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
