<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Divide and conquer: federating many sources</title>
	<atom:link href="http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/feed/" rel="self" type="application/rss+xml" />
	<link>http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/</link>
	<description>Covers topics related to federated search and the deep web</description>
	<lastBuildDate>Mon, 30 Jan 2012 05:01:14 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Stephan Schmid</title>
		<link>http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/comment-page-1/#comment-5589</link>
		<dc:creator>Stephan Schmid</dc:creator>
		<pubDate>Tue, 23 Sep 2008 08:07:14 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/#comment-5589</guid>
		<description>In my opinion, the scalability is mainly a matter of the software architecture, and if it scales nearly linearly, it can grow as needed. With my experience, in general I agree with Jonathan that it&#039;s mostly simpler to go for a single tier approach. A multi-server distribution can make sense for sources that are geographically remote and build a topical unit. 

Some years ago I made a test with 250 test sources (HTTP and JDBC) that were queried multithreaded (on a rather old P4 machine running Linux). Up to four concurrent users worked pretty well - today with fast multicore CPU&#039;s it should run even better. For high volume solutions I would anyway try multiplexed, non-blocking I/O.</description>
		<content:encoded><![CDATA[<p>In my opinion, the scalability is mainly a matter of the software architecture, and if it scales nearly linearly, it can grow as needed. With my experience, in general I agree with Jonathan that it&#8217;s mostly simpler to go for a single tier approach. A multi-server distribution can make sense for sources that are geographically remote and build a topical unit. </p>
<p>Some years ago I made a test with 250 test sources (HTTP and JDBC) that were queried multithreaded (on a rather old P4 machine running Linux). Up to four concurrent users worked pretty well &#8211; today with fast multicore CPU&#8217;s it should run even better. For high volume solutions I would anyway try multiplexed, non-blocking I/O.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sol</title>
		<link>http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/comment-page-1/#comment-5555</link>
		<dc:creator>Sol</dc:creator>
		<pubDate>Mon, 22 Sep 2008 16:01:01 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/#comment-5555</guid>
		<description>Jonathan - I have a draft of a response to your comment that I&#039;m fine tuning and will publish soon as a blog article. I&#039;ve not forgotten you.</description>
		<content:encoded><![CDATA[<p>Jonathan &#8211; I have a draft of a response to your comment that I&#8217;m fine tuning and will publish soon as a blog article. I&#8217;ve not forgotten you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sol</title>
		<link>http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/comment-page-1/#comment-5259</link>
		<dc:creator>Sol</dc:creator>
		<pubDate>Wed, 17 Sep 2008 02:54:05 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/#comment-5259</guid>
		<description>Jonathan - I do have some thoughts in response to your comments but I&#039;d like to wait a day or two and see what other comments people post and then I&#039;ll respond to all of them.</description>
		<content:encoded><![CDATA[<p>Jonathan &#8211; I do have some thoughts in response to your comments but I&#8217;d like to wait a day or two and see what other comments people post and then I&#8217;ll respond to all of them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/comment-page-1/#comment-5256</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Wed, 17 Sep 2008 01:42:59 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/2008/09/16/divide-and-conquer-federating-many-sources/#comment-5256</guid>
		<description>It&#039;s not clear to me why that multi-tiered federated approach would be neccesary with modern hardware and OSs.  What&#039;s the difference between doing that, and just having lots of thread/forks?  Is your merging of result sets REALLY so CPU intensive that you need more than one machine?  And if you do need more than one machine, wouldn&#039;t some other method of &quot;transparent&quot; multi-server distribution (single tier, but distributed across various servers) under the covers be easier than a multi-tiered approach?

It also continues to be curious to me that SerialSolutions federated searching product claims to be able to search across 200 sources at once. I haven&#039;t investigated it enough to know if there&#039;s a hidden gotcha to that claim, or if it really is what it says it is.</description>
		<content:encoded><![CDATA[<p>It&#8217;s not clear to me why that multi-tiered federated approach would be neccesary with modern hardware and OSs.  What&#8217;s the difference between doing that, and just having lots of thread/forks?  Is your merging of result sets REALLY so CPU intensive that you need more than one machine?  And if you do need more than one machine, wouldn&#8217;t some other method of &#8220;transparent&#8221; multi-server distribution (single tier, but distributed across various servers) under the covers be easier than a multi-tiered approach?</p>
<p>It also continues to be curious to me that SerialSolutions federated searching product claims to be able to search across 200 sources at once. I haven&#8217;t investigated it enough to know if there&#8217;s a hidden gotcha to that claim, or if it really is what it says it is.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

