11
Mar
Kosmix is the search engine that produces “topic pages” on millions of subjects. Kosmix creates these topic pages by searching APIs of deep web sources (in real time). In other words, Kosmix relies heavily on federated search for the content of their topic pages. (Actually, Kosmix combines federated search with crawling technology. More about this later in this article.)
Kosmix co-founder, Anand Rajaraman, recently spoke at PARC, the prestigious Palo Alto Research Center. Rajaraman’s talk: “What lies beneath: harnessing the deep web.” A video of the hour-long talk is available at the PARC web-site. The slides are available at the Kosmix Blog.
Rajaraman has very impressive credentials. He is also co-founder of the VC firm Cambrian Ventures, he teaches a class for Stanford’s Computer Science department, and he is former Director of Technology at Amazon.com where:
he was responsible for technology strategy. Anand helped launch the transformation of Amazon.com from a retailer into a retail platform, enabling third-party retailers to sell on Amazon.com’s website. Third-party transactions now account for over 25% of all US transactions, and represent Amazon’s fastest-growing and most profitable business segment.
Rajamaran has his own Kosmix topic page.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
9
Feb
If you’ve got a half hour to spare, maybe in your car via iTunes, then you might enjoy this blogtalkradio interview at Friday Traffic Report: Exploring the Deep Web.
Friday Traffic Report host Jack Humphrey interviewed Bill Wardell about the deep Web. Wardell’s site, The CyberHood Watch Blog, aims to keep families and especially children safe on the Web.
While I know quite a bit about the deep Web, I enjoyed the conversational style within which a basic introduction was provided. I recommend this interview to those of you new to the concept of the deep Web and to new LIS students.
If you enjoyed this post, make sure you subscribe to the RSS feed!
6
Jul
I’m incubating a white paper about the Deep Web. The Deep Web is all that content (more than 99%) of the web that Google can’t find by crawling, right? It’s all that stuff that lives inside databases and can only be found by filling out forms, right? The main value add of Deep Web search engines is that they find only Deep Web documents, right? Not all that long ago I would have answered “yes” to all these questions. Today I’m confused.
Today I was chatting with Darcy from (blog sponsor) Deep Web Technologies’ marketing department about the white paper. I’ll refer to her as Deep Web Darcy. Well, Deep Web Darcy is asking me some rather “deep” questions about the Deep Web. We discussed harvesting, crawling, indexing, Deep Web searching, and so much more. If someone’s Deep Web content finds its way to Google has that content become surfaced and does that content no longer qualify as buried treasure? If one’s Deep Web content can be harvested, is it not really Deep Web content? If someone is browsing that content in the forest, with only one hand on the keyboard, does that content make a sound? So many koans. So little time. My brain hurts.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
13
Mar
I’m always on the lookout for academic articles related to federated search or the deep web to review. I’m embarrassed to not have heard about OAIster until Abe turned me on to it.
If you’re also new to OAIster, here’s a snippet from their About page:
OAIster is a union catalog of digital resources. We provide access to these digital resources by “harvesting” their descriptive metadata (records) using OAI-PMH (the Open Archives Initiative Protocol for Metadata Harvesting). The Open Archives Initiative is not the same thing as the Open Access movement.
The About page goes on to say:
These resources, often hidden from search engine users behind web scripts, are known as the “deep web.” The owners of these resources share them with the world using OAI-PMH.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
23
Feb
Sometimes I write about things that are not quite related to federated search. This is one of those articles. While I am writing about the deep Web, this article is not about the aspect of the deep Web that the federated search community is focused on. But two of the important people in this article are ones I’ve written about before so there is some relevance here if you read on.
I received no fewer than three emails (and a flurry of Google alerts) about Alex Wright’s article in yesterday’s New York Times: Exploring a ‘Deep Web’ That Google Can’t Grasp. I like it when important publications write about the deep Web and help to spread awareness of it.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
2
Feb
I recently took a much needed break. I spent a couple of days with a very dear friend in Colorado. On the drive back to Santa Fe, I called Abe to check in. In our discussion Abe told me that there has been a fair amount of buzz in the blogosophere about Google “surfacing” deep Web content. Last April I first wrote about Google’s efforts to crawl the deep Web. A couple of months later I followed up with Why is Google interested in the deep web. Today there’s more to write about.
Yahoo! Tech News published an article on January 30: Google Researcher Targets Web’s Structured Data (PC World). The article’s first paragraph is ominous, unless you believe that Google is regurgitating old news:
Internet search engines have focused largely on crawling text on Web pages, but Google is knee-deep in research about how to analyze and organize structured data, a company scientist said Friday.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
19
Dec
[ Editor's note: Darcy Pedersen, Mednar Product Manager for blog sponsor Deep Web Technologies (DWT), shares her enthusiasm about the latest good press that DWT's new Mednar medical research portal has received. I welcome stories about good press from any federated search vendor. ]
What’s an alternative search engine, you ask? According to AltSearchEngines.com, their motto is: “The most wonderful search engines you’ve never seen.” On AltSearchEngines.com, you are not only exposed to eloquent reviews on current and new search engines, you get the low-down on up-and-coming technology in the search world. And guess what just poked its head around the corner?
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
12
Dec
Today I wrap up my interview with Erik Selberg, which began with a preview here. Erik answers questions about federated search, about his work at Microsoft and Amazon, and a couple of other questions.
Erik Selberg joins the ranks of federated search luminaries, standing together with Kate Noerr, Todd Miller, and Michael Bergman.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
11
Dec
Alissa Miller has produced an impressive list of deep web-related resources for the Online College Blog. I’m particularly impressed at how much time Alissa must have spent researching resources for the list.
The list is divided into nine sections:
- Meta-Search Engines
- Semantic Search Tools and Databases
- General Search Engines and Databases
- Academic Search Engines and Databases
- Scientific Search Engines and Databases
- Custom Search Engines
- Collaborative Information and Databases
- Tips and Strategies
- Helpful Articles and Resources for Deep Searching
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
5
Dec
Today the interview with Erik Selberg continues. (You can read my preview of this series with Erik (and the list of questions) here. In this installment we further discuss MetaCrawler and we look at in it the context of today’s federated search applications.
Erik Selberg joins the ranks of federated search luminaries, standing together with Kate Noerr, Todd Miller, and Michael Bergman.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!