Growing up with federated search

Author: Walter Warnick

[ Editor's note: Dr. Walter Warnick, Director of the Office of Scientific and Technical Information (OSTI) of the U.S. Department of Energy, is the author of this article. I've had the pleasure of working with Dr. Warnick for over six years in various capacities. Dr. Warnick is extremely passionate about the critical role of federated search in furthering OSTI's mission of making high quality scientific and technical information available to researchers and to the American people. It was Dr. Warnick who pioneered the use of federated search in the Federal government. I consider Dr. Warnick to be a luminary in the industry and will recognize him as such in an upcoming interview.

By way of disclosure, I consult for OSTI and for Deep Web Technologies (DWT). DWT sponsors this blog and powers the search behind a number of OSTI's federated search applications.

In this article Dr. Warnick tells the story of OSTI's relationship with federated search. Even in the pre-web days, the ideas that would lead to federated search were germinating. ]

Growing up with federated search

This is the story of how one organization of the Federal government came to recognize the potential of federated search and then set out to deploy it and encourage its maturation.

Along the way, considerable progress has been made. More science is freely findable on the web today than has ever before been available to the public. Yet, much more progress remains to be made.

Before the Web

Before the web, the Office of Scientific and Technical Information (OSTI) of the Department of Energy used the technology then available to maximize communication about the results of the Department’s research and development program. For example, OSTI created microfiche and sent it to hundreds of depository libraries. It also partnered with on-line vendors like Dialog. Hard copies were made available via a partnership with the National Technical Information Service.

Enter the Web

With the advent of the web, it quickly became clear that the new medium offered tremendous potential to communicate science. Thus, OSTI set out to develop cutting-edge web tools to share e-prints, technical reports, conference proceedings, and other forms of scientific and technical information (STI). Because each form of STI comes from a distinct source, each form follows a distinct pathway which needed to be accommodated, which naturally led to a separate information product for each form.

The Need to Integrate Web Applications

Within a couple years, OSTI had developed a suite of web based databases and was also linking to similar databases offered by other agencies. It was apparent, however, that a suite of tools is not a library. What was needed was a way to integrate all of these databases so that patrons need not search them one at a time. Fortunately, the concept of federating separate sources was just then being introduced to the web. It was an affordable alternative to other integration technologies, such as creating a data warehouse.

OSTI set out to federate its web applications so that all the databases could be searched simultaneously via a single query.

The Power of Relevance Ranking

Along the way, OSTI took every opportunity to encourage the rapid maturation of federated search technology. Most notable was the development of relevance ranking in a federated environment. Before relevance ranking, federated search results were presented in long lists: a set of hits from source A would be followed by a set of hits from source B, and then from source C, and so on. Soon, the patron was overwhelmed with sets of hits. As with surface web search engines, like Google, relevance ranking was a major advance in meeting the needs of patrons.

The challenge was that the technology behind relevance ranking for Google does not work in a federated environment. So, new relevance ranking had to be invented.

The Current Situation

Today, several federations of web applications are available to everyone with internet access. ScienceAccelerator.gov federates key DOE databases. But OSTI progress extends beyond DOE to include Science.gov, which federates U.S. federal agency science information, and WorldWideScience.org, which federates national databases and portals from around the globe. The latter two web applications are actually federations of federations. In addition, OSTI web applications also combine crawling technology, such as used by Google, and federation of databases into a single web application. See http://www.osti.gov/eprints.

While OSTI has successfully advanced and deployed the progression of federated search technology, that technology is new and remains immature.

The Near Future

OSTI has made considerable progress. For example, WorldWideScience, which was conceived, developed and deployed by OSTI, makes findable about the same quantity of science as does Google. It differs from Google in that the content of WorldWideScience has been deemed worthy of publishing by a national government, and much of that content is inherently non-Googleable. Such progress would not be possible were it not for federated search.

Progress has been so rapid that it is not feasible to make reliable predictions beyond the near term. One opportunity in the near term is for private sector organizations to take advantage of government science federations and integrate them together with proprietary content. In this way, a vision for truly enormous science collections, i.e. a billion pages, might become real.

Walter L. Warnick, Ph.D.
Director, Office of Scientific and Technical Information
U. S. Department of Energy

If you enjoyed this post, make sure you subscribe to the RSS feed!


This entry was posted on Wednesday, March 4th, 2009 at 4:40 pm and is filed under viewpoints. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

Leave a reply

Name (*)
Mail (*)