<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Beyond federated search? The conversation continues</title>
	<atom:link href="http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/feed/" rel="self" type="application/rss+xml" />
	<link>http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/</link>
	<description>Covers topics related to federated search and the deep web</description>
	<pubDate>Mon, 15 Mar 2010 15:58:51 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Sol</title>
		<link>http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/comment-page-1/#comment-22174</link>
		<dc:creator>Sol</dc:creator>
		<pubDate>Fri, 27 Mar 2009 03:23:40 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/#comment-22174</guid>
		<description>Thank you, everybody, for all the great comments on both posts.

I'm having a dialogue with Serials Solutions and I expect to get a response to publish here.</description>
		<content:encoded><![CDATA[<p>Thank you, everybody, for all the great comments on both posts.</p>
<p>I&#8217;m having a dialogue with Serials Solutions and I expect to get a response to publish here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/comment-page-1/#comment-21928</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Mon, 23 Mar 2009 23:03:57 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/#comment-21928</guid>
		<description>Well, the native interfaces are discovery tools too, generally. :)  If I _could_ provide a unified meta-search with as much power and flexibility as the native interfaces, surely I would, but it's not realistic at present.  

So we 'provide' both. You can use native interfaces if you want -- and our meta-search tool tries to give you options to do so, not hide them from you. But many users at many times will prefer meta-search, despite it's limitations. They largely prefer it because it's more convenient -- no need to learn a specialty interface, no need to understand what native tool does what.  

If we can make it even more convenient, while ALSO making it work better...  everyone wins. And those native interfaces are still there just as they ever were. That's the promise of Summon, to me. I think they have a good chance of pulling it off. 

Maybe a federatedseachblog interview with someone from Summon?</description>
		<content:encoded><![CDATA[<p>Well, the native interfaces are discovery tools too, generally. <img src='http://federatedsearchblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  If I _could_ provide a unified meta-search with as much power and flexibility as the native interfaces, surely I would, but it&#8217;s not realistic at present.  </p>
<p>So we &#8216;provide&#8217; both. You can use native interfaces if you want &#8212; and our meta-search tool tries to give you options to do so, not hide them from you. But many users at many times will prefer meta-search, despite it&#8217;s limitations. They largely prefer it because it&#8217;s more convenient &#8212; no need to learn a specialty interface, no need to understand what native tool does what.  </p>
<p>If we can make it even more convenient, while ALSO making it work better&#8230;  everyone wins. And those native interfaces are still there just as they ever were. That&#8217;s the promise of Summon, to me. I think they have a good chance of pulling it off. </p>
<p>Maybe a federatedseachblog interview with someone from Summon?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul R. Pival</title>
		<link>http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/comment-page-1/#comment-21889</link>
		<dc:creator>Paul R. Pival</dc:creator>
		<pubDate>Mon, 23 Mar 2009 15:27:26 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/#comment-21889</guid>
		<description>Definitely agreeing with Jonathan on this issue.  We're currently federating a little less than 200 of our nearly 800 databases, in part because some don't return good results, and sometimes because they return overwhelming results (newspaper articles).  It seems the biggest kickback we've seen since implementing federated search (SS 360) is that while folks appreciate the ability to cross search, the time it takes to do so seriously turns people off.  From what I've seen on a couple of demos of Summon, the speed issue has completely gone away.  What that suggests to me is that, as with Google, if you run a search and don't like what you get back, the fact that almost no time was invested means you'll tweak and try again until you like what you're seeing in the results.  That just does not happen with federated search, from what I've seen - our users aren't waiting around for the slowest common denominator to return results.

One of my predecessors in my position, now retired, insisted many years ago that the only way "cross database searching" would work is if the content was indexed locally.  I agreed, but was sure that could never happen because of course the publishers and vendors would never play together.  Now that it appears to be a possibility, I'm very eager to see how it plays out.

I agree again with Jonathan on the issue around comprehensive coverage.  Summon (and to my mind all federated search) is a *discovery* tool, not a replacement for the native interfaces and content providers.  The researchers who need to dive deeply into their areas of specialization will still use individual databases, but for the majority of searchers I think the 80% (just an example number) of coverage that Summon would provide would be just fine.  I also think that if Summon does a good job out of the gate, they'll attract more content providers, snowballing their content coverage.</description>
		<content:encoded><![CDATA[<p>Definitely agreeing with Jonathan on this issue.  We&#8217;re currently federating a little less than 200 of our nearly 800 databases, in part because some don&#8217;t return good results, and sometimes because they return overwhelming results (newspaper articles).  It seems the biggest kickback we&#8217;ve seen since implementing federated search (SS 360) is that while folks appreciate the ability to cross search, the time it takes to do so seriously turns people off.  From what I&#8217;ve seen on a couple of demos of Summon, the speed issue has completely gone away.  What that suggests to me is that, as with Google, if you run a search and don&#8217;t like what you get back, the fact that almost no time was invested means you&#8217;ll tweak and try again until you like what you&#8217;re seeing in the results.  That just does not happen with federated search, from what I&#8217;ve seen - our users aren&#8217;t waiting around for the slowest common denominator to return results.</p>
<p>One of my predecessors in my position, now retired, insisted many years ago that the only way &#8220;cross database searching&#8221; would work is if the content was indexed locally.  I agreed, but was sure that could never happen because of course the publishers and vendors would never play together.  Now that it appears to be a possibility, I&#8217;m very eager to see how it plays out.</p>
<p>I agree again with Jonathan on the issue around comprehensive coverage.  Summon (and to my mind all federated search) is a *discovery* tool, not a replacement for the native interfaces and content providers.  The researchers who need to dive deeply into their areas of specialization will still use individual databases, but for the majority of searchers I think the 80% (just an example number) of coverage that Summon would provide would be just fine.  I also think that if Summon does a good job out of the gate, they&#8217;ll attract more content providers, snowballing their content coverage.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/comment-page-1/#comment-21662</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Sat, 21 Mar 2009 02:40:14 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/#comment-21662</guid>
		<description>Another unique challenge of the academic scholarly meta-search use case is that different databases from different vendors may contain the _same_ content, with or without electronic full text, in unpredictably overlapping ways.  So you've got a de-dup issue too when you combine them.  

There are more too.  I'm not sure how much experience Deep Web Tech has in the academic scholarly search arena, but it really does have it's own special complexities, beyond just, say, aggregating different silos of data or web pages from within a certain federal agency or whatever.</description>
		<content:encoded><![CDATA[<p>Another unique challenge of the academic scholarly meta-search use case is that different databases from different vendors may contain the _same_ content, with or without electronic full text, in unpredictably overlapping ways.  So you&#8217;ve got a de-dup issue too when you combine them.  </p>
<p>There are more too.  I&#8217;m not sure how much experience Deep Web Tech has in the academic scholarly search arena, but it really does have it&#8217;s own special complexities, beyond just, say, aggregating different silos of data or web pages from within a certain federal agency or whatever.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/comment-page-1/#comment-21661</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Sat, 21 Mar 2009 02:36:56 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/#comment-21661</guid>
		<description>MANY of those databases simply offer no reliable machine search access. Sure, you COULD write an HTML screen-scraping solution, but when you're talking about hundreds of resources... actually maintaining that in any reliable way would cost more than we can pay. :)

I haven't actually done an analysis to see exactly why each of those resources can't be federated. Again, see limited resources.  But in the academic market, this is typical, I don't believe my vendor has significantly smaller (or larger) coverage than other vendors in the market. We have lots of websites from small vendors, often non-profit association vendors, that just don't have a lot of functionality, even though they have important content. 

Of course there's value in letting the user choose what to search. My point is not taking that away. My point is that their choices are _neccesarily_ constrained by what is _available_ in the unified search interface.  If you switch to a different set of content, you're maybe going to take away some people's favorite content, and give other people their favorite content they didn't have before. The net effect on your user community may be a wash. Of course we'd LIKE to provide all content in the world, but meanwhile back to reality. 

One of the tricks of letting people choose 'what they want', is that when we're talking hundreds of specialty licensed databases whose names are not household terms -- the typical user has no idea what any of these are, or how to pick from them. Add to that, in a broadcast search environment, the units of selection are neccesarily the individual 'databases' or 'search engines' or 'resources' --- collections of content chosen by someone else already, that can then be mixed and matched in bulk. 

The promise of the Summon approach is that you can choose content according to entirely different categories, at the _journal title_ level, crossing the boundaries of different existing 'databases'.  Summon hopes to make just such subject-selected categories of individual journals.  And presumably allow librarians (or individual users) to create their own too. You aren't constrained to just mixing and matching pre-existing collections, you can slice through what existing vendors happen to have chosen as collections. 

Indeed if they can full text to index, they'll be able to do more htan just metadata. I believe they DO have full text where they can get it, and just metadata in other places. Depending on what the source has or is willing to give them. Of course, even with broadcast search, _some_ remotely searched databases may just contain metadata anyway, others may contain fulltext.  But yeah, it all depends on what Summon can get, and how well they can do with it. The proof will be in the pudding, but I don't think this particular issue is the most likely weak point. 

I agree that two seperate tabs, one for 'indexed' content, and one for 'broadcast search' content, is probably the only decent way to offer both without bringing indexed content down the lowest common denominator of broadcast search.  The trick here is it's going to make no sense to users why some content is only avail in one tab, and other content is only in the other, and others might be in both.  

If I had to choose only one of these tabs, either because it was too confusing to the users to have both, or more likely because I didn't have the resources to maintain both (it's going to be more expensive in terms of local maintanance and/or costs to vendors to have both) -- I'd pick whatever one worked best, naturally.  

Which works best will be something we'll find out once Summon is done. But if it's not Summon at first -- my prediction is that it will be the Summon approach eventually. They might have to fine tune some things, we might need to wait until more content providers are willing/capable to share their metadata and/or indexable fulltext with a vendor like SerSol.  But the Summon approach is the one with long-term legs on it in my opinion, it's the one I'd put my money behind in the long term. 

This analysis applies mainly the academic scholarly research market -- with a main focus on with searching the scholarly peer reviewed literature.  That's the market/use-case/environment I'm familiar with, and it has some special challenges that may not apply to other meta-search applications.  Like the need to include a significant portion of the scholarly universe in order to serve it's function, a scholarly universe which is split over hundreds if not thousands of publishers, platforms, aggregators, and other vendors.  Also the absolute need to get structured citation metadata for the scholarly article 'hits', so it can be passed off to a 'link resolver' to actually deliver the article to the user (often from a different source than the citation was found in).</description>
		<content:encoded><![CDATA[<p>MANY of those databases simply offer no reliable machine search access. Sure, you COULD write an HTML screen-scraping solution, but when you&#8217;re talking about hundreds of resources&#8230; actually maintaining that in any reliable way would cost more than we can pay. <img src='http://federatedsearchblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I haven&#8217;t actually done an analysis to see exactly why each of those resources can&#8217;t be federated. Again, see limited resources.  But in the academic market, this is typical, I don&#8217;t believe my vendor has significantly smaller (or larger) coverage than other vendors in the market. We have lots of websites from small vendors, often non-profit association vendors, that just don&#8217;t have a lot of functionality, even though they have important content. </p>
<p>Of course there&#8217;s value in letting the user choose what to search. My point is not taking that away. My point is that their choices are _neccesarily_ constrained by what is _available_ in the unified search interface.  If you switch to a different set of content, you&#8217;re maybe going to take away some people&#8217;s favorite content, and give other people their favorite content they didn&#8217;t have before. The net effect on your user community may be a wash. Of course we&#8217;d LIKE to provide all content in the world, but meanwhile back to reality. </p>
<p>One of the tricks of letting people choose &#8216;what they want&#8217;, is that when we&#8217;re talking hundreds of specialty licensed databases whose names are not household terms &#8212; the typical user has no idea what any of these are, or how to pick from them. Add to that, in a broadcast search environment, the units of selection are neccesarily the individual &#8216;databases&#8217; or &#8217;search engines&#8217; or &#8216;resources&#8217; &#8212; collections of content chosen by someone else already, that can then be mixed and matched in bulk. </p>
<p>The promise of the Summon approach is that you can choose content according to entirely different categories, at the _journal title_ level, crossing the boundaries of different existing &#8216;databases&#8217;.  Summon hopes to make just such subject-selected categories of individual journals.  And presumably allow librarians (or individual users) to create their own too. You aren&#8217;t constrained to just mixing and matching pre-existing collections, you can slice through what existing vendors happen to have chosen as collections. </p>
<p>Indeed if they can full text to index, they&#8217;ll be able to do more htan just metadata. I believe they DO have full text where they can get it, and just metadata in other places. Depending on what the source has or is willing to give them. Of course, even with broadcast search, _some_ remotely searched databases may just contain metadata anyway, others may contain fulltext.  But yeah, it all depends on what Summon can get, and how well they can do with it. The proof will be in the pudding, but I don&#8217;t think this particular issue is the most likely weak point. </p>
<p>I agree that two seperate tabs, one for &#8216;indexed&#8217; content, and one for &#8216;broadcast search&#8217; content, is probably the only decent way to offer both without bringing indexed content down the lowest common denominator of broadcast search.  The trick here is it&#8217;s going to make no sense to users why some content is only avail in one tab, and other content is only in the other, and others might be in both.  </p>
<p>If I had to choose only one of these tabs, either because it was too confusing to the users to have both, or more likely because I didn&#8217;t have the resources to maintain both (it&#8217;s going to be more expensive in terms of local maintanance and/or costs to vendors to have both) &#8212; I&#8217;d pick whatever one worked best, naturally.  </p>
<p>Which works best will be something we&#8217;ll find out once Summon is done. But if it&#8217;s not Summon at first &#8212; my prediction is that it will be the Summon approach eventually. They might have to fine tune some things, we might need to wait until more content providers are willing/capable to share their metadata and/or indexable fulltext with a vendor like SerSol.  But the Summon approach is the one with long-term legs on it in my opinion, it&#8217;s the one I&#8217;d put my money behind in the long term. </p>
<p>This analysis applies mainly the academic scholarly research market &#8212; with a main focus on with searching the scholarly peer reviewed literature.  That&#8217;s the market/use-case/environment I&#8217;m familiar with, and it has some special challenges that may not apply to other meta-search applications.  Like the need to include a significant portion of the scholarly universe in order to serve it&#8217;s function, a scholarly universe which is split over hundreds if not thousands of publishers, platforms, aggregators, and other vendors.  Also the absolute need to get structured citation metadata for the scholarly article &#8216;hits&#8217;, so it can be passed off to a &#8216;link resolver&#8217; to actually deliver the article to the user (often from a different source than the citation was found in).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
