OSTI | Federated Search BlogFederated Search

This is one of my occasional off-topic posts.

One of my clients, the Office of Scientific and Technical Information (OSTI) has an innovative program to help digitize some of their technical reports that are currently only available in paper format and I thought I’d spread the word.

Adopt-A-Doc? is a service focused on getting full text technical reports from OSTI’s Energy Citations Database digitized. Here’s a description of the Energy Citations Database :

The Energy Citations Database (ECD) provides free access to over 2.3 million science research citations with continued growth through regular updates. There are over 209,000 electronic documents, primarily from 1943 forward, available via the database. Citations and documents are made publicly available by the U.S. Department of Energy (DOE).

ECD includes scientific and technical research results in disciplines of interest to DOE such as chemistry, physics, materials, environmental science, geology, engineering, mathematics, climatology, oceanography, computer science and related disciplines. It includes bibliographic citations to report literature, conference papers, journal articles, books, dissertations, and patents.

Read the rest of this entry »


Six years ago I moved from the Bay Area to New Mexico to be closer to my brother Abe and to his family. For five of those years I was an employee of his at Deep Web Technologies. Now, I just write for this blog and do some project work for him. Ever since starting to work for Deep Web, and to this day, I’ve supported DOE OSTI (The US Department of Energy Office of Scientific and Technical Information) in a number of capacities. OSTI is chartered to disseminate scientific and technical information to the public, especially as it pertains to DOE’s interests. OSTI has built a number of highly visible applications for this purpose, and some of these perform federated search and use technology developed by Deep Web.

Read the rest of this entry »



Author: Sol

[ Editor’s disclaimer: I am paid by both Deep Web Technologies and by the DOE Office of Scientific and Technical Information (OSTI) for various projects. ]

On Wednesday, Deep Web Technologies was named Small Business Innovation Research small business of the year by the U.S. Department of Energy (DOE). (There’s a press release regarding the award here.) In order to receive the prestigious award, Deep Web had to provide a song and slide show; a small snippet of the song plus the slide show was played at the ceremony.

Can you guess what song Deep Web chose?

Read the rest of this entry »


We’ve all heard the old adage, “Don’t believe everything you read.” The Internet is full of stuff to read; how do we know what to believe? While there are numerous search engines that present us with documents in response to our queries, how do we know if the information presented in these documents is accurate? Granted, much of what’s in the Internet is personal opinion and sometimes all we want is someone’s viewpoint. There are times, however, when we need to know that the information we are reading is of high quality. We may be researching product features to make a purchase decision, company information to form competitive intelligence strategy, or medical information to address a medical concern.

A major part of the answer to the question of whether information is accurate or not is to examine its source. This is where federated search engines really shine. By their nature, federated search applications usually query deep web database sources. The databases can’t be crawled. There are no links for Google to follow to extract all documents in such a database. Now, let’s consider the type of content that lives in these non-crawlable databases. Publishers who specialize in scientific, technical, and business research articles are most likely to store their documents in databases and to make their content searchable by federated search engines. Geological, geographic, demographic data lives in databases. Much political data lives in databases as well.

Read the rest of this entry »


The first thing that most people notice when they use a federated search application is that it’s not nearly as fast as Google. We’ve all gotten spoiled. This is not only the information age, it’s the age of quick information; we all want every search to be as fast as a Google search. However, by its very nature, federated search can’t be as fast as Google. Federated search is at the mercy of the sources it federates. If a source is slow to return results to the federated search application, then there’s nothing the federated search application can do, or is there?

Deep Web Technologies has been displaying incremental results for some time now. The idea is simple: display results in chunks as they are received from the sources being searched. Science.gov, WorldWideScience.org, and Scitopia.org are three applications that display incremental results. While there are challenges to this approach, there are some significant benefits as well. The aim of displaying incremental results is to minimize the time the user has to wait to see some results. In the show-something-quick department, incremental results works well. The major challenge arises when you try to figure out what to do with the rest of the results as they come in.

Read the rest of this entry »


The U.S. Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) has, ever since the Manhattan project, been responsible for stewardship of DOE-related research results, which it makes available for free to scientists, researchers, and the public. The OSTI blog was started last November to share personal perspectives of OSTI employees. Recently, the blog was expanded to include a technology thread. OSTI’s use of technology, much of it based on federated search, should be of interest to readers of this blog.

Due to my familiarity with OSTI technology (from five years of helping to develop and support OSTI products through my relationship with this blog’s sponsor, Deep Web Technologies), I was asked to write for the technology thread, being the sole author of some articles and collaborating author on others.

Read the rest of this entry »