<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: If only I knew what the deep web was</title>
	<atom:link href="http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/feed/" rel="self" type="application/rss+xml" />
	<link>http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/</link>
	<description>Covers topics related to federated search and the deep web</description>
	<pubDate>Wed, 10 Mar 2010 10:27:43 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Eric</title>
		<link>http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/comment-page-1/#comment-28812</link>
		<dc:creator>Eric</dc:creator>
		<pubDate>Wed, 08 Jul 2009 05:14:32 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/?p=773#comment-28812</guid>
		<description>If the Deep Web is defined as 'hidden' or inaccessible, once it becomes found and accessble, it's no longer part of the Deep Web, right?  If 'owners' of the deep web make some or part of their content spiderable and accessible, it's no longer the deep web, right?  To me, the Deep Web is a problem, and Goolgle (and other search engines) are finding solutions to unearthing it, and it no longer becomes a problem.</description>
		<content:encoded><![CDATA[<p>If the Deep Web is defined as &#8216;hidden&#8217; or inaccessible, once it becomes found and accessble, it&#8217;s no longer part of the Deep Web, right?  If &#8216;owners&#8217; of the deep web make some or part of their content spiderable and accessible, it&#8217;s no longer the deep web, right?  To me, the Deep Web is a problem, and Goolgle (and other search engines) are finding solutions to unearthing it, and it no longer becomes a problem.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Noerr</title>
		<link>http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/comment-page-1/#comment-28802</link>
		<dc:creator>Peter Noerr</dc:creator>
		<pubDate>Wed, 08 Jul 2009 01:36:46 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/?p=773#comment-28802</guid>
		<description>Of course two other areas of the Deep Web (other than "for fee" content for academics - or anybody) are enterprise databases and volatile data.

Enterprise content is kept out of the surface web by design (not always all that well) so it is never going to be "surfaced" One could argue that the for-fee content of the publishers and aggregators is part of an enterprise data store, but it is really a "product" designed to be sold eventually. What I am talking about are the corporate internal resources - anything from HR records to secret recipes. They will stay hidden (or deep).

The volatile data is only deep because of its nature. Here today, gone in a flash. This data by its nature will never become part of the surface web. Although any one piece of information has a fleeting existence (the current temperature, a stock price, etc.) the data item is there long term, so it sort of flickers in and out of existence (not to get overly philosophical...) Interestingly one of the categories of these are search results where they happen to be displayed at the exact instant one of the search engines crawls that results page. A little cunning Google research will find a few interesting OPAC pages captured and indexed in various states. And, of course the advent of AJAX and friends means that some pages are partly static and partly volatile. And so it goes on.

One final question. Are Federated Search engines really associated only with the Deep Web? And if so should they be? Or should they range more widely?</description>
		<content:encoded><![CDATA[<p>Of course two other areas of the Deep Web (other than &#8220;for fee&#8221; content for academics - or anybody) are enterprise databases and volatile data.</p>
<p>Enterprise content is kept out of the surface web by design (not always all that well) so it is never going to be &#8220;surfaced&#8221; One could argue that the for-fee content of the publishers and aggregators is part of an enterprise data store, but it is really a &#8220;product&#8221; designed to be sold eventually. What I am talking about are the corporate internal resources - anything from HR records to secret recipes. They will stay hidden (or deep).</p>
<p>The volatile data is only deep because of its nature. Here today, gone in a flash. This data by its nature will never become part of the surface web. Although any one piece of information has a fleeting existence (the current temperature, a stock price, etc.) the data item is there long term, so it sort of flickers in and out of existence (not to get overly philosophical&#8230;) Interestingly one of the categories of these are search results where they happen to be displayed at the exact instant one of the search engines crawls that results page. A little cunning Google research will find a few interesting OPAC pages captured and indexed in various states. And, of course the advent of AJAX and friends means that some pages are partly static and partly volatile. And so it goes on.</p>
<p>One final question. Are Federated Search engines really associated only with the Deep Web? And if so should they be? Or should they range more widely?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Martin</title>
		<link>http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/comment-page-1/#comment-28793</link>
		<dc:creator>Mark Martin</dc:creator>
		<pubDate>Tue, 07 Jul 2009 20:46:19 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/?p=773#comment-28793</guid>
		<description>An interesting and entertaining read!</description>
		<content:encoded><![CDATA[<p>An interesting and entertaining read!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Calder</title>
		<link>http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/comment-page-1/#comment-28777</link>
		<dc:creator>Bob Calder</dc:creator>
		<pubDate>Tue, 07 Jul 2009 03:54:26 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/?p=773#comment-28777</guid>
		<description>The Deep Web is a place. If other people get to it, that doesn't change its location.

What has been bothering me is how to frame discussing the places where Deep Web rubs up against Semantic Web.</description>
		<content:encoded><![CDATA[<p>The Deep Web is a place. If other people get to it, that doesn&#8217;t change its location.</p>
<p>What has been bothering me is how to frame discussing the places where Deep Web rubs up against Semantic Web.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/comment-page-1/#comment-28776</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Tue, 07 Jul 2009 03:19:21 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/?p=773#comment-28776</guid>
		<description>I've always thought 'deep web' was kind of a shifty concept. 

Even though by far Google is the most popular free public search engine, it shouldn't be measured based only on what _Google_ has indexed, should it?

But yeah, one way or another, I think more and more of the web will become indexed by free public search engines.  For the portions that aren't -- and keeping in mind that this is not an 'existential' category, but just an 'accidental' one -- perhaps 'hidden web'?  With some parts of it more hidden than others?

One part of the 'somewhat hidden web' whose hidden nature is as much policy-based as technically-limited is for-pay licensed content. 

In the academic meta-search market, providing access to this for-pay licensed material is one of the main motivators. 

On the one hand, even vendors of for-pay licensed content and search indexes are trying to figure out how to expose their content to Google without giving away the cow for free. Sometimes this leads to odd arrangements where they are willing to share more with Google (for free) than they are with even their own paying customers, or to special arrangements with Google that they don't allow other search engines to do. Or, on the other end, I'm guessing these special arrangements with Google sometimes involve Google giving them special dispensaton to violate a rule Google normally applies to the general public -- allowing them to present a different version of a page to the google search engine that isn't presented to the general public. 

I'm not sure how long these kind of weird special arrangements can go on, but they're trying to figure out their business model, same as everyone else.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve always thought &#8216;deep web&#8217; was kind of a shifty concept. </p>
<p>Even though by far Google is the most popular free public search engine, it shouldn&#8217;t be measured based only on what _Google_ has indexed, should it?</p>
<p>But yeah, one way or another, I think more and more of the web will become indexed by free public search engines.  For the portions that aren&#8217;t &#8212; and keeping in mind that this is not an &#8216;existential&#8217; category, but just an &#8216;accidental&#8217; one &#8212; perhaps &#8216;hidden web&#8217;?  With some parts of it more hidden than others?</p>
<p>One part of the &#8217;somewhat hidden web&#8217; whose hidden nature is as much policy-based as technically-limited is for-pay licensed content. </p>
<p>In the academic meta-search market, providing access to this for-pay licensed material is one of the main motivators. </p>
<p>On the one hand, even vendors of for-pay licensed content and search indexes are trying to figure out how to expose their content to Google without giving away the cow for free. Sometimes this leads to odd arrangements where they are willing to share more with Google (for free) than they are with even their own paying customers, or to special arrangements with Google that they don&#8217;t allow other search engines to do. Or, on the other end, I&#8217;m guessing these special arrangements with Google sometimes involve Google giving them special dispensaton to violate a rule Google normally applies to the general public &#8212; allowing them to present a different version of a page to the google search engine that isn&#8217;t presented to the general public. </p>
<p>I&#8217;m not sure how long these kind of weird special arrangements can go on, but they&#8217;re trying to figure out their business model, same as everyone else.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Darcy</title>
		<link>http://federatedsearchblog.com/2009/07/06/if-only-i-knew-what-the-deep-web-was/comment-page-1/#comment-28775</link>
		<dc:creator>Darcy</dc:creator>
		<pubDate>Tue, 07 Jul 2009 02:10:33 +0000</pubDate>
		<guid isPermaLink="false">http://federatedsearchblog.com/?p=773#comment-28775</guid>
		<description>Sol,

The next time we are in a very interesting conversation (to me), please remember that I would be happy to search the "shallow" web for a nearby carwash so you don't have to rush off so quickly! (smile) 

I look forward to our next conversation.  Perhaps we could ponder your proposed koan?

Deep Web Darcy</description>
		<content:encoded><![CDATA[<p>Sol,</p>
<p>The next time we are in a very interesting conversation (to me), please remember that I would be happy to search the &#8220;shallow&#8221; web for a nearby carwash so you don&#8217;t have to rush off so quickly! (smile) </p>
<p>I look forward to our next conversation.  Perhaps we could ponder your proposed koan?</p>
<p>Deep Web Darcy</p>
]]></content:encoded>
	</item>
</channel>
</rss>
