Archive for December, 2007


As 2007 comes to an end I’d like to reflect on some of the major happenings in federated search this past year, hopefully setting a precedent for a yearly post.

In early November Microsoft announced that it was giving away Microsoft Search Server 2008 Express, an Enterprise-level search product based on SharePoint technology. Microsoft Search Server 2008, already available as a Beta, incorporates federated search capabilities based on the Open Search standard. I believe that this announcement will not have any impact on federated search in libraries but will have some impact on vendors trying to sell federated search products into corporate IT departments. Federated search capabilities in the Search Server product are limited to sources that can be accessed via Open Search. Merging/relevance ranking of results are not supported because Microsoft doesn’t believe it can be done well. All in all, I believe that this announcement by Microsoft will help to bring federated search more into the mainstream and is good for our industry.

With Microsoft Search Server 2008, Microsoft is clearly firing some salvos at the Google Search Appliance, in particular its OneBox offering which also supports some federated search capabilities.

Read the rest of this entry »


Part I of this series on content access basics explained how screen scraping is used by many federated search engines (FSEs) performing deep web searches to process search results plus the problems associated with this approach. This article provides an introduction to how XML-formatted search results are processed by FSEs.

FSEs use jargon such as “XML gateway” or “XML interface” to refer to the fact that they have a way of interacting with a particular content source using XML. It may be that the FSE generates XML and submits an XML query or that search results are generated by the remote search engine and returned as an XML document. In this article we are going to focus on the processing of XML results.

So, what is XML? Wikipedia has a nice introduction to XML plus a few examples. Here’s a nice simple tutorial on XML. The important idea about XML is that there is no ambiguity about where to find information. XML is intended for consumption by computer programs. It is very highly structured.

Read the rest of this entry »


In this multi-part series we will look at a number of different approaches that federated search engines (FSEs) take to access content from remote databases.

FSEs are always at the mercy of the content provider when it comes to searching and retrieving content. FSEs perform deep web searches since they access content that lives inside of databases. Read the earlier articles on crawling vs. deep web searching and introduction to the deep web for background information on deep web searching. Also, read the article about connectors to understand how the query processing and search engine submission process works for deep web searching.

When FSEs search deep web databases they often do so by filling out search forms much like humans do and they also process result lists (summaries of documents generated by the remote search engines) much like the way humans examine the search results in their browsers. Processing a list of search results by reading and dissecting the HTML that a search engine provides is called “screen scraping.” Wikipedia has an article about screen scraping.

Read the rest of this entry »


Abe discovered “Searching the Deep Web” on YouTube yesterday. This is a nice professionally produced 6-minute introduction to the Deep Web made by the US Department of Energy Office of Scientific and Technical Information (OSTI). Deep Web Technologies is mentioned in the video as DWT has created the search technology for a number of major OSTI applications.

What’s very cool about this video hitting You Tube is that Abe and I think of You Tube as hosting very mainstream videos. We like the idea of the public being exposed to federated search in such a venue.

For your viewing pleasure, here’s the video. Make yourself some popcorn, relax, and enjoy the show!

YouTube Preview Image


Research and Markets, a large producer of market research reports, has for sale a report: Academic Library Website Benchmarks. Per the report’s description “[t]he report presents data from 82 North American college libraries about their library website policies and development plans.”

Of particular interest is the second to last paragraph in the description:

Just over a third of the sample responded that they were currently offering federated search capabilities from the website, so that a broad range of library databases could be searched at once. Three out of four research universities had federated search capabilities, compared to just 53.33% of PhD-level granting institutions, 29.27% of 4-year/MA granting institutions, and just 8.33% of community colleges. The mean number of subject-specific search windows offered through federated searches was 19.72.

Clearly there is tremendous opportunity to sell federated search into the higher education market if, overall, only a third of the sample in the study reported offering federated search. Of deeper interest is the low use of federated search in 4-year/MA granting institutions (29%) and even lower level of adoption at community colleges (8%).

Read the rest of this entry »


Marcus Zillman will be speaking about his Deep Web Research 2008 publication in his Awareness Watch program on BlogTalkRadio. The show will air January 13, 2008 at 2:00 PM Eastern Time. Here’s the show description from the show’s information page:

We will be discussing my latest publication Deep Web Research 2008 that describes the many many resources and sites that you can drill deep into the web to discover information that is not available through the traditional search engine! We will also be scrolling through the blogs and bringing the latest Net sightings and updates!

Note that the focus of this program is in deep web content sources, not in its federation so his program will not be a federated search program. Nevertheless, those in the federated search world should be interested in knowing about content worth federating.


I remember well waking up early one morning, November 18, 2004 (no, I didn’t remember the exact date but Outlook did), to a flurry of emails from some of my East Coast customers.

They had seen a story in the New York Times announcing the birth of Google Scholar. A number of questions were raised – were federated search applications such as going to become obsolete? Should we federate Google Scholar?

A few months later there was a brief article in Digital Librarian (this article is no longer available but here’s a summary) announcing that “2005 is the year that will be remembered (in the library world) as the year that federated search became obsolete.”

2007 is coming to a close, Google Scholar is still in Beta, and federated search is alive and doing well. In the last few years we’ve seen tremendous improvements in federated search and I expect that the years ahead will be an exciting time for Deep Web Technologies and others in our industry. I have high hopes that this blog can become “the place” where all kinds of information about federated search can be shared and openly discussed.


Education Institute is hosting an hour-long web conference, Federated Search: New Tools and Best Practices, on January 8th at 1:00pm Eastern Time. Here’s the description from the Education Institute web-site:

Do you have a federated searching tool or are thinking about getting one? Federated search is an important tool that enables libraries to give users what they want: good information, fast and easily. A brief discussion of the mechanics of fed search tool will be followed by a discussion of recent developments in the field. What are the best practices for implementing and configuring you federated search? Tune in and find out.

The presenters are Frank Cervone and Jeff Wisniewski. Their bios are on the registration page for the web conference.

Cervone and Wisniewski delivered a presentation at Internet Librarian 2007 in October: Recent Trends in Federated Search: A Snapshot of the Landscape Today. Deep Web Technologies founder Abe Lederman was in the audience and he reports that the talk was informative as well as entertaining.

I will be attending the web conference and will report back on what I learn.


Samuel Dean has in his blog, Web Worker Daily, a post titled Need a Better Search Engine? Roll Your Own. He’s writing about a new web-site, Rollyo. Rollyo allows you to roll your own search engine, which they call a searchroll. The site defines a searchroll as:

“… a collection of the sites you trust and find useful. It’s a personal search engine you create to provide relevant results from a hand selected list of reliable sites.”


Dean’s blog post shares a little bit of his experience with Rollyo.

Rollyo lacks the ability to fill out search forms and search sources live so it’s not a federated search engine but it might be useful as a tool for aggregating some crawled and indexed content and using it as a source for a federated search engine.

I’ll have to review the site and report back on its pros and cons, and compare Rollyo to Google Custom Search Engine and to IBM OmniFind Yahoo! Edition .


In June of this year, Barclay Hill of Intel delivered a presentation to the Special Libraries Association (SLA) at its Annual Conference about Intel’s experience bringing federatedInformation Outlook: Sept. 2007 search to their corporate library. Hill is manager of the Web and Systems Group at the Intel Library at Intel Corporation. Associated with the presentation is an article, “Federated Search at the Intel Library.” A revised version of the article was published in the September 2007 edition of Information Outlook, SLA’s monthly magazine. SLA has given permission for Deep Web Technologies, whose federated search technology was selected by Intel and who is referenced in the article, to post the article on Deep Web Tech’s web-site. Please follow this link to the article.

Hill’s article is a case study in bringing federated search to Intel from requirements through implementation. The article should be of interest to anyone exploring a federated search solution for a corporate environment as this topic is not widely covered in the literature, especially discussion of a large-scale deployment within a multinational corporation. We welcome hearing of your experiences with federated search in the corporation, or elsewhere, through comments in this blog, through guest posts, and through references to relevant articles.