Archive for January, 2011


[ This is a republication of the article, "Deep Web Tech in the News: Image Search" that was published in the Deep Web Technologies Blog. Note that Deep Web Technologies sponsors the Federated Search Blog and that I consult for the organization, OSTI, that stewards ]

Deep Web Tech in the News: Image Search

One small step for, one giant leap for Federated Search.

“ is a gateway to more than 42 scientific databases and 200 million pages of science information with just one query, and is a gateway to more than 2,000 scientific websites from 18 organizations within 14 federal science agencies. These agencies represent 97% of the federal R&D budget. is the portal to science and the U.S. contribution to is hosted by the Department of Energy Office of Scientific and Technical Information, within the Office of Science, and is supported by CENDI, an interagency working group of senior scientific and technical information managers.” received a pretty large upgrade in December, the image search is located under “special collections” and works just like except the results have thumbnails ( The search query now quickly pulls back related images from multiple sources into a thumbnail size result. This is one of very few publicly available science image search portals. Cheryl LaGuardia, an industry critic, wrote:

For a free service this works mighty well: my test search for “tornedo” got the reply, “Did you mean “tornado”? with 151 results for the corrected spelling (a test, mind you, or perhaps I’m easing back into work slowly and may have inadvertently misspelled… no matter! The system works!). The resultant images are terrific, compelling enough to send Dorothy pedaling madly down the road away from them on her bicycle, with Toto in tow.

Deep Web Technologies powers the entire website, and we look forward to using this innovation on other projects in the future.


The Pegasus Librarian published yesterday a blog article: Heads they win, tales we lose: Discovery tools will never deliver on their promise. That’s a pretty strong statement about discovery services but I don’t think the title exaggerates. If you are on the fence about the industry do take a few minutes to read the article.

I’ve raised the concern more than once about how users of discovery services are at the whim of the service owners who are providing them with access to content. Check out all articles tagged with “discovery service” in this blog or these articles in particular:

  • What a mess!. My radar (Google Alerts) pointed me this morning to this article by Barbara Quint at Information Today. My first response to “EBSCO Exclusives Trigger Turmoil” was “What a mess!” Quint shares the saga of EBSCO and Gale lobbing volleys at each other during the ALA Midwinter meeting. EBSCO announces new acquisitions that were ‘exclusive to EBSCO for the library “marketspace.”‘ Major competitor Gale issued a letter to the library community urging “librarians to get involved in opposing publishers granting exclusives, at least to EBSCO.” Read Quint’s article for all the gory details. …
  • Carl Grant on bypassing the library. There are two issues here. What value do librarians bring to their patrons and do discovery services erode that value? Carl’s latest piece on these subjects looks at the question of “how libraries might get bypassed in the context of e-book supply strategies.” He gives three criteria he believes libraries need to carefully consider when selecting a discovery tool (or e-content): “content-neutrality”, “deep-search and/or metasearch support,” and “The ability to load and search databases unique to your user’s information needs.” …
  • Beyond federated search? The danger with relying on any one service to provide you with access to its indexed content is that the service’s criteria for source selection may not be yours. That’s why I recommend hybrid solutions to get the most out of indexed content and the freedom of including federated sources of your choosing as well.

And then we read about the mess with EBSCO pulling out of Ex Libris’ Primo Central:

As you may know, for the past eighteen months, we have been indexing in Primo Central a number of the EBSCO databases. EBSCO has now changed their strategy and will no longer permit third-party discovery services to load and index their content. Therefore, starting 1st January 2011 we will cease hosting of the EBSCO content in the Primo Central Index. EBSCO will, however, permit our use of a specialized API to search the EBSCO content ‘just-in-time’.

Read a fresh if unsettling perspective on discovery services at the Pegasus blog.


Tony Russell-Rose at the Information Interaction Blog co-authored a paper about the changing face of search. He provides the article in this blog post.

Here’s the article’s list of major changes they see already happening or happening in the future:

  1. Freshness of content. Not just recent but fresh AND authoritative.

  2. Context in search and personalization. Already in use by major search engines, e.g. use of information from previous searches and use of user’s location. Implicit use of what users do with search results to infer their interests.

  3. Natural language processing. Especially driven by social media and user-generated content.

  4. Disruptive effect of search engines beyond the big three.

  5. Consolidation in the enterprise search marketplace plus new companies entering the space catalyzed in large part by the maturing of Solr as a viable alternative to commercial search engines.

  6. A growing focus on the user experience. Ten blue links are no longer enough. Search needs to support “exploratory search tasks, such as comparison, aggregation, analysis, synthesis, evaluation, and so on.”

  7. Accessibility support beyond the blind community. “For dyslexic people there is a need to understand the searching behaviour of such users, and build personalised interfaces which react to the type of dyslexia and learn from their interaction with the user interface.” Also support for severely physically disabled people.

  8. The digital divide. Getting information to people in countries with oppressive governments. Supporting the use of mobile devices for information exchange.

And, the paper’s conclusion:

Finally, when most people talk about search, they typically envisage a web page with a search box and a results list. But search is increasingly becoming a ubiquitous part of our daily lives, helping us make sense of the world around us. Search is the means by which we are able to cope with our overflowing email inboxes, to generate insights from masses of corporate data, and to discover new restaurants in an unfamiliar city armed only with a smartphone and an Internet connection. Search will be everywhere, but invisible, contextualised, and personalised.


My brother Abe just started a LinkedIn discussion in the Enterprise Search Engine Professionals group. Here’s the post.

Happy New Year, Everyone!

I’m interested to know who is doing real-time federated search in the enterprise. By “real-time” I mean searching sources live, not building nor searching an index. Have you implemented such a beast? Has it been successful? What have the challenges been? Access security and policy issues come to mind. What do you see as the advantages and disadvantages of federated search in the enterprise?

By way of disclosure, I co-founded Verity, and I founded and run the federated search company, Deep Web Technologies.

Here are a few links that might be of interest to participants in this discussion:

What are your thoughts?

That last link refers to a New Idea Engineering article that discusses a number of important features of federated search in the enterprise.

  • Flexible rules for combining results from all of the engines searched
  • Maintaining Users Security Credentials
  • Mapping User Security Credentials to other security domains
  • Advanced Duplicate Detection and Removal
  • Combining results list Navigators, such as Faceted Search links and Taxonomy Nodes.
  • Handling other results list links such as “next page” and sort order.
  • Translating user searches into the different search syntaxes used by the disparate engines.
  • Extracting hits from HTML results, AKA “scraping”, hopefully without the need to custom code.

If you know of any activity in the enterprise search world that intersects with federated search and that doesn’t involve building and maintaining indices Abe and I would love it if you would join the conversation.

For those of you new to this blog, federated search vendor Deep Web Technologies is the sponsor.