Archive for June, 2010


Here are a couple of fun videos. The first is the Parisian Love Google search story that aired during the Super Bowl. The second is a great parody of a Google search story. Feel inspired? Create your own Google Search Story with YouTube’s Video Director, complete with your choice of 24 different music tracks.


YouTube Preview Image

YouTube Preview Image

Hat tip to Making Curriculum Pop.


In March I reported on an article that Barbara Quint, editor-in-chief of Information Today’s Searcher Magazine, published for DCLnews: Federated Searching
Good Ideas Never Die, They Just Change Their Names
. DCLnews is one of the publications of Iris Hanney‘s business support services company, Unlimited Priorities.

Abe Lederman, Deep Web Technologies founder and president and sponsor of this blog, was quoted in this article regarding his experience with one particularly thorny aspect of federated search:

So how do federated search services handle [author searching] problems? In an article written by Miriam Drake that appeared in the July-August 2008 issue of Searcher entitled “Federated Search: One Simple Query or Simply Wishful Thinking,” a leading executive of a federated service selling to library vendors was quoted as saying, “We simply search for a text string in the metadata that is provided by the content providers - if the patron’s entry doesn’t match that of the content provider, they may not find that result.” Ah, the tough luck approach! In contrast, Abe Lederman, founder and president of Deep Web Technologies (, a leading supplier of federated search technology, responded about his companies work with Scitopia, a federated service for scientific scholarly society publishers, “We spend a significant amount of effort to get it as close to being right as possible for Scitopia where we had much better access to the scientific societies that are content providers. It is not perfect and is still a challenge. The best we can do is transformation.”

I reviewed Miriam Drake’s article last July.

Read the rest of this entry »


Image and Data Manager (IDM) is an online magazine with a focus on information management for Australia and New Zealand. Today they published an article: Virtual aggregation trumps data migration.

The article starts with a couple of poignant examples of failures in knowledge management infrastructures:

In the United States, federal intelligence bodies failed to “connect the dots” they had been compiling when Al Qaeda terrorist Umar Farouk Abdul Mutallab attempted to blow up an airliner in late 2009.

In the United Kingdom, the cases of Khyra Ishaq and Baby P highlighted the all-too-common lack of early warning systems that could have saved the lives of young victims. Child protection services agencies possessed the information that could have protected Ishaq and Baby P but not the infrastructure necessary to alert them to potential problems.

The article argues that trying “to merge massive amounts of information from disparate data sources” has been a huge failure. The article continues with a good argument for staying with federated search:

With today’s heightened focus on risk, many CIOs are now recognising the outcomes that can be generated through federated search. The key premise being to avoid risky and costly data migration or physical aggregation exercises, and leave data in place. In today’s enterprise, data needs to live and breathe in different places.

The article is a fast and easy read and its arguments are worth serious consideration for those in the “federate or migrate” discussion.


I’ve read many articles about the Semantic Web. Most are very abstract. So, I was pleased to discover “Semantic Web: Your Web’s Smarter Younger Brother” by Tom Robinson. Robinson provides a list of nine ways the Semantic Web will positively affect us. The items are such that we can all understand and relate to. He cites the source as semantic expert Tony Shaw but doesn’t provide a reference to the list. Here are the first five items, listed in reverse order (of importance, I imagine):

9. Annoying ads that have nothing to do with your interests will disappear.

8. Your computer will understand you through natural language recognition. When you tell it you want directions to that restaurant on Main Street with the amazing French onion soup, it will know what you’re talking about.

7. All of your computers will become more intuitive and easier to use.

6. Your bank will implement semantically-driven fraud monitoring systems. And your credit card company won’t erroneously reject your charges for a meal in London ever again, because it will understand that you bought airline tickets to England.

5. New types of consumer products will emerge that allow you to connect to your doctors and other medical experts globally. That means you’ll get a faster diagnosis of your illness and a wider array of better treatment options available to you.

See the article for the full list.

Robinson provides his own list of implications for colleges and universities. Here are the first three:

6. Research libraries already use this technology to connect disparate scientific databases.

5. Students will be able to do class-related research faster and more comprehensively so they can spend more energy on data analysis and writing.

4. Matching technology will appear in new generation job boards to create intuitive profiles and match applicants and schools that are good fits for each other.

Interesting food for thought.


Last Friday blog sponsor Deep Web Technologies released its beta version of multilingual federated search, available at Deep Web Technologies and several government agencies key to the effort acknowledged the great accomplishment via press releases.

Deep Web Technologies

HELSINKI, June 11 /PRNewswire/ — Deep Web Technologies unveiled multilingual translation capability today for the WorldWideScience Alliance using its federated search application., the international science portal, is the first application to be deployed with this unique capability. Abe Lederman, President and CTO of Deep Web Technologies, demonstrated the new technology at the International Council for Scientific and Technical Information’s (ICSTI) 2010 Summer Conference in Helsinki. ICSTI is a primary sponsor of the Alliance, whose purpose is to provide “a geographically diverse, governance structure to promote and build upon the original vision of a global science gateway.”

Multilingual federated search translates a user’s search query into the native languages of the collections being searched, aggregates and ranks these results according to relevance, and translates result titles and snippets back to the user’s original language. The translation, powered by Microsoft, makes it simple to search collections in multiple languages from a single search box in the user’s native language. The Conference will include a keynote address by Tony Hey, Corporate Vice President of the External Research Division of Microsoft Research, as well as a presentation by Dr. Walter Warnick, Director of the Office of Scientific and Technical Information of the U.S. Department of Energy Office of Science. (More)

US Department of Energy Office of Science

Washington, D.C.—Scientific language barriers were broken today in Helsinki with the launch of Multilingual While a large share of scientific literature is published in English, vast quantities of high-quality science are not, and the pace of non-English scientific publishing is increasing. will now enable the first-ever real-time searching and translation across globally-dispersed, multilingual scientific literature using complex translations technology.

“In an increasingly interconnected world, resolving the global challenges of science requires rapid communication of scientific knowledge,” said Dr. William F. Brinkman, Director of the Office of Science, U.S. Department of Energy. “Breaking the language barrier through will help erode borders and build research networks across DOE, the nation, and around the globe.” (More)

DOE Office of Scientific and Technical Information

OAK RIDGE, TN - Now you can find non-English scientific literature from databases in China, Russia, France, and several Latin American countries and have your search results translated into one of nine languages. With the beta launch today (view the Office of Science announcement) of Multilingual, real-time searching and translation of globally-dispersed collections of scientific literature is possible. This new capability is the result of an international public-private partnership between the Alliance and Microsoft Research, whose translation technology has been paired with the federated searching technology of Deep Web Technologies.

Microsoft Research Corporate Vice-President Tony Hey said, “We are extremely pleased to have our Microsoft Translator technology used with WorldWideScience. Built at Microsoft Research, this translation technology already provides translations to millions of users. Partnering with WorldWideScience is an opportunity to advance science across language barriers and improve scientific discovery.” (More)

British Library

World Wide Science Alliance broadens access to global research with the launch of a new multilingual tool, enabling scientists to simultaneously search and translate over 400 million pages of scientific research published in 65 countries from around the world.

Although most scientific literature continues to be published in English, the pace of non-English scientific publishing is increasing rapidly, with vast quantities of high-quality science now being produced every year. Launched today at the International Council for Scientific and Technical Information (ICSTI) annual conference in Helsinki, Finland, a new beta version of will enable scientists to break down the language barrier, facilitating greater global cooperation with regards to the pursuit of scientific research. (More)


High fidelity search?

Author: Sol

I just found this fantastic article, The High Fidelity Challenge, at ACRL’s blog, ACRLog. For the longest time I’ve had the concern that students pick Google and discovery services over federated search because of the speed factor, even in cases where federated search brings more targeted, more credible, and more relevant results. But, my complaining falls mostly on deaf ears. Speed is addicting.

The ACRL article makes these sobering claims:

Students no longer care about using high quality information.

Students are all too willing to satisfice for whatever content they can find along the path of least resistance.

Students are too dependent on search tools that facilitate their use of low quality sources.

I’m hooked. Here’s another quote from the article:

These are common concerns we academic librarians have about our undergraduates. We lament that they’ve abandoned high quality library-supported resources for those that are easy to find and use but which offer lower quality content. As we’ve been told, convenience trumps quality, and our students often prove it’s true.

I’m liking this article a whole lot. The author, Steven Bell (second place winner in the first Federated Search Blog contest), draws a fascinating analogy between music and search, specifically the quality of music vs. the quality of search:

Read the rest of this entry »


Librarians do Gaga

Author: Sol

If you’ve not seen this parody of Lady Gaga’s Bad Romance you’ll enjoy this great video made by students and faculty from the University of Washington’s Information School.
YouTube Preview Image

Hat tip to Jenny Luca.


Clusters that think

Author: Sol

[ Editor's Note: I'm republishing this article, by Brian DeSpain, from the Deep Web Technologies Blog. It does great job of explaining how their clustering solution adds value to federated search. ]

Clusters that think

One of the most interesting features of our Explorit search product is our clustering engine, which analyzes results and produces “clusters” that represent a new and powerful way to navigate search results. The true power of these clusters is often overlooked, for they superficially resemble the output generated by the keyword-based systems and fixed taxonomies of other search engines. Our clustering technology, however, is more akin to a document-discovery engine, which provides a significant improvement over the alternatives in the library world.

The Explorit engine provides a unique approach to clustering taken from Latent Semantic Analysis (or LSA). We took a look at some of the traditional methods at taxonomy generation (i.e. learning approaches, semantic knowledge bases, and word nets) and after carefully examining their advantages and shortcomings, we chose latent semantic analysis, and a “description comes first” approach, to provide a rich result analysis tool for customers. LSA is a fully automatic mathematical/statistical technique for extracting and inferring relations of contextual usage of words in search results. This technology provides a concept-based approach to analyzing and clustering results from a result set. Applying the LSA approach, our clustering engine analyzes the relationships between a set of documents and the terms contained within the documents to produce a set of concepts related to the results. In other words, our search engines can generate more sophisticated and nuanced result clusters, which will help to cut down on the time and tries it takes for users to find the desired information.
Read the rest of this entry »


WebSearch University’s Fall conference has some workshops (September 26) and talks (September 27 and 28) that readers of this blog might appreciate:

Delving Into Deep Web Business Resources

Marydee Ojala, ONLINE Magazine

Anyone approaching business research today needs to understand the wealth of information available on the deep, invisible web. To effectively and efficiently find data on companies, industries, markets, and management, you should consult specialized as well as general search engines; exploit social media resources; choose to search directories, groups, portals, images, blogs, feeds, wikis, and statistical files; consider fee-based tools; and concentrate on effectively conceptualizing. This seminar, taught by an experienced business searcher, will concentrate on resources but will also include practical techniques for using these resources.

Government Tools & Sites

Laura Gordon-Murnane, Library, BNA

It’s no secret that the U.S. government is a prolific publisher. With a new administration comes a new attitude toward information transparency and disclosure that affects not only federal government information, but also filters down to the state and local levels. The implications for searchers are vast. If you ever thought government data was boring, dull, or lackluster, this session will open your eyes to exciting opportunities of maximizing the value of government information.

Social Networking and Real-Time Research

Marydee Ojala, ONLINE Magazine

Use of social networking sites such as Facebook, Twitter, and LinkedIn has skyrocketed in the past year. As a global phenomenon, millions of people use social media to generate content, share ideas, and keep in touch with family, friends, work colleagues, companies, associations, and causes. They can be a source and tool for research. Marydee Ojala will address the where, when, and how aspects of social networking research, including authenticity, trust, and information overload, along with some real-world caveats.

Semantic Search Engines

Tamas Doszkocs, Specialized Information Services Division, National Library of Medicine

New generations of search engines are not just on the horizon, they’re here. Semantic search engines go far beyond keywords, using a variety of signals and
behavioral analysis to understand the intent of your search. This presentation, by a noted computer scientist at the National Library of Medicine, will demonstrate
the basics of semantic search as they apply to an innovative federated search solution. Semantic searching is utilized at every step of the process, including automatic query enhancement, semantic search result clustering, and information mashups.