Archive for March, 2010


Your comments matter

Author: Sol

I was dismayed to discover this morning that Gmail has been marking my comment moderation emails as spam for the last couple of months. I just approved a bunch of comments. Since not all comments require moderation (e.g. comments from people whose comments have been approved in the past automatically get approved) some comments had been posted which made it harder for me to notice that a bunch required moderation.

Your comments matter tremendously, as they contribute your insights and experiences to the articles. I know, as a blog reader myself, that astute comments add tremendously to the value of the blog itself.

I’m very sorry (and embarrassed) about the screw up and I’ll watch for comments needing moderation more closely.


Carl Grant just published a thought provoking piece at his ExLibris blog: Discovering the need for discovery solutions that also support meta/federated searching. Like all of Carl’s articles, this one is worth a careful read.

Carl argues that the limitations of federated search don’t make the technology useless in the face of discovery services. He sees both technologies as having an important role in the library and, in fact, his company, ExLibris, sells both types of technology.

Here’s the gist of Carl’s argument that libraries need both solutions:

My answer would be that most libraries likely need both of these solutions because they ultimately meet different end-user needs. Both are discovery tools, but they meet the needs of end-users in different ways and deliver different capabilities.

For example, an undergraduate needing to assemble a paper quickly might well benefit from a search of a mega-aggregate index that quickly produces several results that can be used interchangeably. However, the student or researcher conducting deep research into a subject will likely want to know not only everything available from known resources, but also from unknown resources. Then the need for meta/federated searching becomes more important because it will very likely broaden the content they can find. Understanding these two divergent set of needs require the library to offer different tools within the common discovery interface.

Read the rest of this entry »


Earlier this month ReadWriteWeb reported on a mechanism Google is creating for real-time indexing:

Google is developing a system that will enable web publishers of any size to automatically submit new content to Google for indexing within seconds of that content being published. Search industry analyst Danny Sullivan told us today that this could be “the next chapter” for Google.

And, here’s an interesting comment.

Last Fall we were told by Google’s Brett Slatkin, lead developer on the PubSubHubbub (PuSH) real time syndication protocol, that he hoped Google would some day use PuSH for indexing the web instead of the crawling of links that has been the way search engines have indexed the web for years.

If PuSH is as widely used as Google hopes it will be then this is a major paradigm shift for the search giant. No, Google won’t stop crawling the Web but if a critical mass of Web publishers get Google (and presumably other search engines) to index their content very quickly then the real-time Web will take a giant leap forward.

It will be interesting to see how PuSH impacts the federated search community. Clearly the real-time Web can move scientific information very quickly. Perhaps this new technology and paradigm will augment nicely the flow of scientific papers found by federated search applications in the deep Web.


[ This is a continuation of Part I. ]

The Kosmix approach to federation relies heavily on APIs to structured data in different domains of specialization. APIs can be searched in real-time to generate topic pages with very current information. Slides 28 through 32 give some excellent reasons for why “trends favor the federated approach:”

  • Social Media (content volume grows rapidly, access controls can prevent indexing, opportunities for personalization)
  • Real-Time Information (Earthquake in China (2008), US Airways 1549 Hudson landing (2009), Iran elections (2009))
  • Specialized search engines (It’s a shame to take all this richness and compress it into 10 results links!)
  • Innovative visualizations
  • Business Model issues
  • Algorithmic Content
  • Availability of APIs

Read the rest of this entry »


Laura at the Llyfrgellydd blog rants about library technology. And, she takes no hostages. “Multiple systems, or at the very least, the appearance of multiple systems, are enemies to usability” is Laura’s first lob. Oh goodie, she’s going to have something nice to say about federated search, right, since hiding those multiple systems are friends to usability? Not a chance.

I think one of the reasons federated search doesn’t work is because the metadata is coming from so many different sources that it just can’t be translated consistently. It seems a huge waste of time that vendors have people working on connectors to read that metadata and parse it.

I didn’t realize that federated search doesn’t work. I assume Laura is writing about the metadata like title, author, and snippet that forms the search results pages. Yes, consistency isn’t perfect but it’s not nearly as bad as Laura pronounces.

Kidding aside, go read Laura’s article. You’ll get the perspective of a “past life” (her words) reference librarian who has many bones to pick with library technology. It’s good market research material.


Kosmix is the search engine that produces “topic pages” on millions of subjects. Kosmix creates these topic pages by searching APIs of deep web sources (in real time). In other words, Kosmix relies heavily on federated search for the content of their topic pages. (Actually, Kosmix combines federated search with crawling technology. More about this later in this article.)

Kosmix co-founder, Anand Rajaraman, recently spoke at PARC, the prestigious Palo Alto Research Center. Rajaraman’s talk: “What lies beneath: harnessing the deep web.” A video of the hour-long talk is available at the PARC web-site. The slides are available at the Kosmix Blog.

Rajaraman has very impressive credentials. He is also co-founder of the VC firm Cambrian Ventures, he teaches a class for Stanford’s Computer Science department, and he is former Director of Technology at where:

he was responsible for technology strategy. Anand helped launch the transformation of from a retailer into a retail platform, enabling third-party retailers to sell on’s website. Third-party transactions now account for over 25% of all US transactions, and represent Amazon’s fastest-growing and most profitable business segment.

Rajamaran has his own Kosmix topic page.

Read the rest of this entry »


Hope Leman received a $500 prize for achieving second place in the 2nd annual Federated Search Blog contest. Her essay, Not So Wild a Dream: The Science 2.0 Federated Search Dream Machine, is published here in its entirety.

Hope is a research information technologist for Samaritan Health Services in Oregon where she is helping to develop a service to help scientists and public health researchers find professional conferences and places to submit their research papers. Hope’s essay shares her dream of creating a federated search engine to help scientists with some key aspects of research: finding the current state of research on a topic and finding calls for papers and presentations.

Not So Wild a Dream: The Science 2.0 Federated Search Dream Machine

by Hope Leman

Many of us love someone who is ill. Most of us have loved someone who has been cut down far too early in life from illness. What can we do to enable scientific researchers to make quicker progress on advances in such fields as cell biology, neuroscience, pharmacology and other realms that will lead to breakthroughs that will help prevent or cure devastating diseases? We can make crucial information easier to find and disseminate. Federated searching (the ability to search multiple databases simultaneously) is one way to do that.

Read the rest of this entry »


I was absolutely delighted to read a recent article by Barbara Quint, editor-in-chief of Information Today’s Searcher magazine. Federated Searching: Good Ideas Never Die, They Just Change Their Names reminds us that federated search existed before the term became popular:

Even back in the days when only professional searchers accessed online databases, searchers wanted some way to find answers in multiple files without having to slog through each database one at a time. In those days, the solution was called multi-file or cross-file searching, e.g. Dialog OneSearch or files linked via Z39.50 (ANSI/NISO standard for data exchange).

A little sidebar: I heard about this article from my brother Abe (founder, President, and CTO of blog sponsor Deep Web Technologies) before it came onto my Google Alerts radar. Abe was at the NFAIS Conference where he had gone to deliver a presentation on multilingual federated search. At the conference, Abe had a conversation with Iris Hanney, President of Unlimited Priorities, a support services company for businesses. It turns out that Barbara Quint is a member of the Unlimited Priorities team and produced this article for one of their publications, DCLNews. And, that’s how Abe heard about the article, in which he’s mentioned. Small world!

Read the rest of this entry »