Archive for May, 2010


Jeffrey Beall from the University of Colorado Denver, has a nice slide presentation: The Shortcomings of Full-Text Searching.

The slide show lists 14 problems one encounters with search engines. Here’s the list:

  1. The synonym problem. You search for “dentures” but don’t think to search for “false teeth.”
  2. Obsolete terms. You’re researching the history of motion pictures and don’t think to search for “photoplay.”
  3. The homonym problem. Your search engine doesn’t do clustering and you search for “conductor.” Or, you search for “Roger Morris” and find the wrong one. Or, you search for “red,” which means “network” in Spanish.
  4. Spamming. There’s lots of junk in the indexes of the big search engines to make your searches less effective.
  5. Inability to narrow searches by facets. Clustering and search refinement doesn’t exist in all search engines.
  6. Inability to sort search results. It can be hard to organize results.
  7. The aboutness problem. Just because the result has your terms in it doesn’t mean the result is actually about the term.
  8. Figurative language. You search for information about “drowning” and find a document about someone “drowning in birthday presents.”

  9. Search words not in web page. There is supposedly a book about the French Revolution that does not use the term “French Revolution.”
  10. Abstract topics. How do you find useful document on “health,” “free will” or “ethics?”
  11. Paired topics. Art and mental illness, architecture and philosophy, and movies and fascism are examples of paired topics. Often search engines find documents with both terms but the terms are not related, they just happen to appear in both documents.
  12. Word lists. You’re searching for a term. What you find is a word list that contains your term but has nothing to do with your term.
  13. The Dark Web. That’s the Deep Web. Lots of quality information is in the Deep Web and not accessible to Google and the other crawlers.
  14. Non-textual things. Without meta data or tagging non-text data is very difficult to find.

What’s Beall’s conclusion? Search the library databases directly. I’m confused because searching library databases IS performing full-text search. I think Beall is focusing on the Surface Web search engine (Google and Bing, for example) as the major sources of the problem. To some extent searching sources directly or via federated search can overcome these problems depending on how scholarly the content is, how good the meta data is, and how good the underlying search engines are.

Hat tip to the PurpleSearch Blog.


After the dust settled for Ken Varnum I had the opportunity to interview him about winning the top prize in this year’s Federated Search Blog contest.

  1. How did you hear about the Federated Search Blog contest?
    I saw it mentioned on a listserv I subscribe to (web4lib, I think). I remember seeing the contest advertised last year, as well, although I did not enter it then.
  2. What inspired you to enter the contest?
    I had been thinking about the ‘problem’ of federated search for some time and had already started a project at the University of Michigan Library that was somewhat narrower than what was described in the “Project Lefty” essay I submitted. I was frankly curious if the ideas I had been working on for some time had any resonance outside my library and, if so, what sort of feedback I might receive.
  3. Read the rest of this entry »


[ Editor's note: This article first appeared in the OSTI Blog. Dr. Walt Warnick, Director of the Office of Scientific and Technical Information, part of DOE, and I co-authored the article. For some important search applications there is no alternative to federated search.]

Discovery services have begun to appear in the search landscape. Discovery services provide access to documents from publishers with which they have relationships by indexing the publishers’ metadata and/or full text. Discovery services are marketed to libraries where patrons appreciate near-instantaneous search results and where library staff is willing to restrict access to sources available from the service (and optionally the library’s own holdings.) While these services tout themselves as improvements to federated search, the reality is that there is no alternative to federated search for a number of important applications. is a global gateway to science. The federated search application was conceived and developed at OSTI and hosted by us. The portal performs live federated search of 70 databases from 66 countries. Participating members provide access to their national research databases. For a number of reasons this important gateway to millions of research documents does not lend itself to the discovery service model.

Read the rest of this entry »


Abe Lederman, founder and president of blog sponsor Deep Web Technologies, has been asked by partner Swets to speak at one of their Webinars about his experiences with federated search and with Deep Web Technologies’ Explorit - sold under the name SwetsWise Searcher. Abe will demonstrate the product together with Marieke Heins from Swets. The Webinar is free, live, and Abe will take questions. You can register here.

The Webinar is next Wednesday, May 12, 11.00 AM EST, and is open to an international audience.

Here is the topic of the Webinar:

With the amount of information available online rapidly expanding and residing in more disparate sources, you need to help your users simplify the way they discover and access the content they need.

Join our web information session and live demonstration on how SwetsWise Searcher can help you to provide your users with quick and relevant search results. In the Webinar, guest speaker Abe Lederman from Deep Web Technologies will share his experience with in the federated search field and how to accelerate the diffusion of knowledge. He has 25 years of experience in computer software engineering.

Here is more information about Abe’s talk:

Researchers, particularly students, are making Google their first stop for research because it is “quick and easy”. They assume that Google will find the authoritative, scholarly information that they are seeking. However, the information in Google is not always the highest quality or the most reliable content. Librarians now have the opportunity to team up with a Federated Search vendor to once again make the librarian a search authority in finding scholarly information. In this session, the audience will learn of the features and capabilities that are currently available in Federated Search. The audience will also learn how librarians can play a key collaborative role in bringing Federated Search to their patrons.

Free registration is on the Swets site.