Archive for July, 2010


The U.S. Census Bureau has a federated search tool in development, Data Ferrett.

The (Beta)DataFerrett helps you locate and retrieve the data you need across the Internet to your desktop or system, regardless of where the data resides.

DataFerrett is a unique data mining and extraction tool. (Beta)DataFerrett allows you to select a databasket full of variables and then recode those variables as you need. You can then develop and customize tables. Selecting your results in your table you can create a chart or graph for a visual presentation into an html page. Save your data in the databasket and save your table for continued reuse.

I have no idea how useful the tool is but their mascot sure is cute!



Author: Sol

Jeff Jonas recently published an article, “When Federated Search Bites.” If this article is meant to be link bait, I’m not biting. You can get a link from Google.

I certainly don’t know everything about federated search but I know enough to recognize what’s not federated search, at least not what most of us think to be federated search.

The article, really a rant, starts off reasonably enough:

Federated search: conducting a search against ?n? source systems via a broadcast mechanism without the benefit or guidance of an index.

I am speaking specifically about environments where the systems in the federation are heterogeneous, are physically dispersed, were not engineered for federation a priori, and are not managed by a common command and control system.

Here’s another reasonable statement:

Most organizations have some obligation to make sense of what they know. For example, the airline should know if the person added to the watch list is already an employee or already has a flight reservation. Ideally, the moment such facts become knowable, someone or some system should be notified. Think of this as ?the data speaks to itself.? I call this data finds data.

Yes, having new data trigger analysis is a good idea. But, IT’S NOT FEDERATED SEARCH.

So, the entire basis of the rant is that federated search is not this advanced analysis system I want therefore it sucks. That’s like saying that my oven doesn’t analyze the food I put in it and automatically cook it perfectly therefore my oven “bites.”

There may be a discussion about the challenges of analyzing federated data vs. indexed data but that has nothing to do with what federated search does.

What do you think? Does the article make sense to you?


Federated search as a transformational technology enabling knowledge discovery: the role of” is by far the best historical paper I’ve read about DOE’s Office of Scientific and Technical Information (OSTI), and I consult for the agency.

OSTI has created a number of search portals (, Science,gov, DOE ScienceAccelerator, DOE Energy Citations Database, and DOE Information Bridge to name a few) but few know about the history of the agency that created them.

OSTI grew out of the post-World War II initiative to make the scientific research of the Manhattan Project as freely available to the public as possible. On November 17, 1944, President Roosevelt wrote Vannevar Bush, then the Director of the Office of Scientific Research and Development, to request his counsel on how to capitalize on the experience of the United States’ R&D war efforts — most of which was done in utter secrecy — in the days of peace to come.

OSTI Director Dr. Walter Warnick tells the story of the development of OSTI, its role in advancing science, and how federated search serves that role in ways that Google can’t.

The paper, at 23 pages, covers the subject with a good deal of depth.

Read the rest of this entry »


Google just announced that they would buy ITA Software, regulators permitting. Here’s another Google purchase that would take Google deeper into smarter searching.

Semantic processing is taking a big step forward.

From Mashable:

Google Acquires Metaweb to Improve Search

Google has acquired semantic web and real world database company Metaweb, a move the company says will help them ?improve search and make the web richer and more meaningful for everyone.?

We wrote about Metaweb back in 2008 when they received a significant chunk of funding to the tune of $42 million, on top of their first round of $15 million back in 2006. Since then the company has built its Freebase open database into a collection of over 12 million items from entertainment (movies, books, TV shows) to locations, celebrities, companies and other ?real world? objects. Google says the plan is to preserve and further develop the database and hope to enlist other companies to make use of and contribute to the data.

In addition to fleshing out Freebase, Google also hopes to leverage Metaweb to enhance its efforts with features like rich snippets and search answers, both of which aim to give back ?smarter? and more immediate results to specific queries. Right now, simpler requests like ?Barack Obama birthday? and ?events in San Jose? can spawn relevant answers right at the top of the search results page, but Google hopes to take this initiative further by feeding in more facts about the real world from Metaweb?s data repository.

Resource Shelf has some very insightful thoughts on the acquisition.

Here’s a good video on what Metaweb is about:
YouTube Preview Image

Here’s Google’s announcement of the acquisition.


Hope Leman is one of my favorite people. I know of very few individuals who are as passionate about anything as is Hope. Hope won second place in our second Federated Search Blog contest and I commented on her passionate review of in 2008.

Hope wrote again about Her article is at her blog, Signifcant Science. Hope is a research information technologist for a health network in Oregon. She is also Web administrator of the free online grants and scholarship listing service, ScanGrants, and of the free online search platform, ResearchRaven. From several conversations with Hope I know that ScanGrants is a labor of love and a good demonstration of Hope’s passion about helping researchers.

In Multilingual WorldWideScience: Accelerating Scientific Research, Empowering Researchers Hope reminds us of the key role that search plays in research especially in the world of free science and foreign language science.

Hope’s message is personal, and I love that:

As someone who grew up in a family that housed students who had left home and family in China, Japan, Iran, Korea and other countries to study engineering, chemistry, physics, biochemistry and so on at Oregon State University here in my hometown of Corvallis, Oregon I know what brilliant people there are in many countries who have so much to offer and what a boon it will be that the work of researchers worldwide will become useable to each of them and benefit the rest of us.

This update on Hope’s friend who suffered from ALS is even more touching:

I have recently lost a friend to amyotrophic lateral sclerosis and I would often sadly reflect as I bicycled home from her house about the glacial pace of progress on research on that disease and others like it. That is why I find Dr. Warnick’s enthusiasm and practical accomplishments so very admirable and the best possible case for paying one?s taxes with a minimal amount of grumbling. He is putting federal funds to exemplary use

Dr. Warnick, Director of OSTI, conceived WorldWideScience and his agency hosts and manages the search portal.

Databases and search engines aren’t about getting one’s job done. At the noblest level, they’re about solving important problems, and saving lives when we can.

[ Disclaimer: OSTI is one of my consulting clients. Deep Web Technologies, who built the single and multiple language search engines behind and who sponsors this blog is another of my clients. ]


I don’t write about metasearch engines very often but I think that Google’s proposed purchase of ITA Software is worth commenting on. Here’s some info from Google:

On July 1, 2010, Google announced an agreement to acquire ITA Software, a Cambridge, Massachusetts flight information software company, for $700 million, subject to adjustments.

Google’s acquisition of ITA Software will create a new, easier way for users to find better flight information online, which should encourage more users to make their flight purchases online.

The acquisition will benefit passengers, airlines and online travel agencies by making it easier for users to comparison shop for flights and airfares and by driving more potential customers to airlines’ and online travel agencies’ websites. Google won’t be setting airfare prices and has no plans to sell airline tickets to consumers.

Because Google doesn’t currently compete against ITA Software, the deal will not change existing market shares. We are very excited about ITA Software’s QPX business, and we’re looking forward to working with current and future customers. Google will honor all existing agreements, and we’re also enthusiastic about adding new partners.

Read the rest of this entry »


JISC, the UK-based education and research organization, commissioned a report from OCLC to bring together findings from different studies on how the way people look for information in libraries and online is changing. The commissioned study synthesizes the results of twelve studies. There is also a podcast What does the digital information seeker look like? at the JISC web-site that summarizes the findings.

What did the study find that would be of interest to the federated search community? Here are my thoughts:

  • “… there is an identifiable need for training, support and improved systems to help people find the information they need.” As we dream up more bells and whistles we need to consider whether users can effectively use the features we give them today. If they can’t then it’s the vendor’s responsibility to simplify the interface, easing the training burden of the library staff. After all, how many people take training classes in using Google? But, then again, how many people use Google’s advanced search?

  • “E-journals are increasingly important to the research process and the majority of professional researchers have embraced digital content”. Make sure you provide access to the journals your patrons need. Discovery services may provide access to some of them. Federated search can provide access to others.

  • “Immediate access to information from their own desktop computer is almost taken for granted and gaining access to the full-text journal article is seen as more of an issue than discovering the information sources.” This speaks to the importance of good link resolvers. It’s no surprise that users are not satisfied with an abstract and no way to get the full text of the article.

    Read the rest of this entry »


Tulane Reference Librarian Paul St-Pierre presents a compelling case for federated search technology in a 31-minute video.
YouTube Preview Image.

While the video is largely about Tulane’s experience with Metalib the first ten minutes or so articulate problems that Tulane was seeing that motivate the search for a technology solution and that piece of the video is vendor-neutral.

St-Pierre explains that the problem at Tulane is “too much information.” Nothing new here. But, at 500 indexes and databases and 30,000 e-journals managing that information is a bigger challenge for them than for many other organizations.

Before federated search Tulane had many search tools, many user interfaces, and it was complicated to navigate the different tools, especially with documents being in many formats. St-Pierre described the situation as there being many paths to get to text.

Tulane’s competition is Google. It’s easy and it brings back lots of information. But, as St-Pierre reveals in three graphs, things are not as simple as they appear on the surface.

Read the rest of this entry »


[ Editor's note: This article is republished from the Deep Web Technologies Blog. It is Abe's perspective on the launch of Multilingual Federated Search in Helsinki last month. ]

Photo credit: Jakke Nikkarinen/STT Info Kuva Pictured, from left, Dr. Walter Warnick, U.S. Department of Energy Office of Scientific and Technical Information (OSTI) Director; Yuri Arskiy, All-Russian Institute of Scientific and Technical Information (VINITI) Director; Tony Hey, Microsoft Research Corporate Vice-President; Richard Boulderstone of the British Library and the WorldWideScience Alliance Chairman; and Wu Yishan, Institute of Scientific and Technical Information of China (ISTIC) Chief Engineer.

It was an honor to attend and for my company to have played a key role in the launch of multilingual in Helsinki this past June 11th. Beginning more than three years ago, the R&D effort that ultimately resulted in the launch of our ground-breaking multilingual federated search capability involved plenty of hard work by lots of folks at Deep Web Technologies. It certainly could not have been accomplished without our invaluable partnerships with the Department of Energy Office of Scientific and Technical Information (OSTI), the WorldWideScience Alliance, and Microsoft Research.

Read the rest of this entry »