High fidelity search?

Author: Sol

I just found this fantastic article, The High Fidelity Challenge, at ACRL’s blog, ACRLog. For the longest time I’ve had the concern that students pick Google and discovery services over federated search because of the speed factor, even in cases where federated search brings more targeted, more credible, and more relevant results. But, my complaining falls mostly on deaf ears. Speed is addicting.

The ACRL article makes these sobering claims:

Students no longer care about using high quality information.

Students are all too willing to satisfice for whatever content they can find along the path of least resistance.

Students are too dependent on search tools that facilitate their use of low quality sources.

I’m hooked. Here’s another quote from the article:

These are common concerns we academic librarians have about our undergraduates. We lament that they’ve abandoned high quality library-supported resources for those that are easy to find and use but which offer lower quality content. As we’ve been told, convenience trumps quality, and our students often prove it’s true.

I’m liking this article a whole lot. The author, Steven Bell (second place winner in the first Federated Search Blog contest), draws a fascinating analogy between music and search, specifically the quality of music vs. the quality of search:

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!


Librarians do Gaga

Author: Sol

If you’ve not seen this parody of Lady Gaga’s Bad Romance you’ll enjoy this great video made by students and faculty from the University of Washington’s Information School.

Hat tip to Jenny Luca.

If you enjoyed this post, make sure you subscribe to the RSS feed!


Clusters that think

Author: Sol

[ Editor's Note: I'm republishing this article, by Brian DeSpain, from the Deep Web Technologies Blog. It does great job of explaining how their clustering solution adds value to federated search. ]

Clusters that think

One of the most interesting features of our Explorit search product is our clustering engine, which analyzes results and produces “clusters” that represent a new and powerful way to navigate search results. The true power of these clusters is often overlooked, for they superficially resemble the output generated by the keyword-based systems and fixed taxonomies of other search engines. Our clustering technology, however, is more akin to a document-discovery engine, which provides a significant improvement over the alternatives in the library world.

The Explorit engine provides a unique approach to clustering taken from Latent Semantic Analysis (or LSA). We took a look at some of the traditional methods at taxonomy generation (i.e. learning approaches, semantic knowledge bases, and word nets) and after carefully examining their advantages and shortcomings, we chose latent semantic analysis, and a “description comes first” approach, to provide a rich result analysis tool for customers. LSA is a fully automatic mathematical/statistical technique for extracting and inferring relations of contextual usage of words in search results. This technology provides a concept-based approach to analyzing and clustering results from a result set. Applying the LSA approach, our clustering engine analyzes the relationships between a set of documents and the terms contained within the documents to produce a set of concepts related to the results. In other words, our search engines can generate more sophisticated and nuanced result clusters, which will help to cut down on the time and tries it takes for users to find the desired information.
Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!


WebSearch University’s Fall conference has some workshops (September 26) and talks (September 27 and 28) that readers of this blog might appreciate:

Delving Into Deep Web Business Resources

Marydee Ojala, ONLINE Magazine

Anyone approaching business research today needs to understand the wealth of information available on the deep, invisible web. To effectively and efficiently find data on companies, industries, markets, and management, you should consult specialized as well as general search engines; exploit social media resources; choose to search directories, groups, portals, images, blogs, feeds, wikis, and statistical files; consider fee-based tools; and concentrate on effectively conceptualizing. This seminar, taught by an experienced business searcher, will concentrate on resources but will also include practical techniques for using these resources.

Government Tools & Sites

Laura Gordon-Murnane, Library, BNA

It’s no secret that the U.S. government is a prolific publisher. With a new administration comes a new attitude toward information transparency and disclosure that affects not only federal government information, but also filters down to the state and local levels. The implications for searchers are vast. If you ever thought government data was boring, dull, or lackluster, this session will open your eyes to exciting opportunities of maximizing the value of government information.

Social Networking and Real-Time Research

Marydee Ojala, ONLINE Magazine

Use of social networking sites such as Facebook, Twitter, and LinkedIn has skyrocketed in the past year. As a global phenomenon, millions of people use social media to generate content, share ideas, and keep in touch with family, friends, work colleagues, companies, associations, and causes. They can be a source and tool for research. Marydee Ojala will address the where, when, and how aspects of social networking research, including authenticity, trust, and information overload, along with some real-world caveats.

Semantic Search Engines

Tamas Doszkocs, Specialized Information Services Division, National Library of Medicine

New generations of search engines are not just on the horizon, they’re here. Semantic search engines go far beyond keywords, using a variety of signals and
behavioral analysis to understand the intent of your search. This presentation, by a noted computer scientist at the National Library of Medicine, will demonstrate
the basics of semantic search as they apply to an innovative federated search solution. Semantic searching is utilized at every step of the process, including automatic query enhancement, semantic search result clustering, and information mashups.

If you enjoyed this post, make sure you subscribe to the RSS feed!


Jeffrey Beall from the University of Colorado Denver, has a nice slide presentation: The Shortcomings of Full-Text Searching.

The slide show lists 14 problems one encounters with search engines. Here’s the list:

  1. The synonym problem. You search for “dentures” but don’t think to search for “false teeth.”
  2. Obsolete terms. You’re researching the history of motion pictures and don’t think to search for “photoplay.”
  3. The homonym problem. Your search engine doesn’t do clustering and you search for “conductor.” Or, you search for “Roger Morris” and find the wrong one. Or, you search for “red,” which means “network” in Spanish.
  4. Spamming. There’s lots of junk in the indexes of the big search engines to make your searches less effective.
  5. Inability to narrow searches by facets. Clustering and search refinement doesn’t exist in all search engines.
  6. Inability to sort search results. It can be hard to organize results.
  7. The aboutness problem. Just because the result has your terms in it doesn’t mean the result is actually about the term.
  8. Figurative language. You search for information about “drowning” and find a document about someone “drowning in birthday presents.”

  9. Search words not in web page. There is supposedly a book about the French Revolution that does not use the term “French Revolution.”
  10. Abstract topics. How do you find useful document on “health,” “free will” or “ethics?”
  11. Paired topics. Art and mental illness, architecture and philosophy, and movies and fascism are examples of paired topics. Often search engines find documents with both terms but the terms are not related, they just happen to appear in both documents.
  12. Word lists. You’re searching for a term. What you find is a word list that contains your term but has nothing to do with your term.
  13. The Dark Web. That’s the Deep Web. Lots of quality information is in the Deep Web and not accessible to Google and the other crawlers.
  14. Non-textual things. Without meta data or tagging non-text data is very difficult to find.

What’s Beall’s conclusion? Search the library databases directly. I’m confused because searching library databases IS performing full-text search. I think Beall is focusing on the Surface Web search engine (Google and Bing, for example) as the major sources of the problem. To some extent searching sources directly or via federated search can overcome these problems depending on how scholarly the content is, how good the meta data is, and how good the underlying search engines are.

Hat tip to the PurpleSearch Blog.

If you enjoyed this post, make sure you subscribe to the RSS feed!


After the dust settled for Ken Varnum I had the opportunity to interview him about winning the top prize in this year’s Federated Search Blog contest.

  1. How did you hear about the Federated Search Blog contest?
    I saw it mentioned on a listserv I subscribe to (web4lib, I think). I remember seeing the contest advertised last year, as well, although I did not enter it then.
  2. What inspired you to enter the contest?
    I had been thinking about the ‘problem’ of federated search for some time and had already started a project at the University of Michigan Library that was somewhat narrower than what was described in the “Project Lefty” essay I submitted. I was frankly curious if the ideas I had been working on for some time had any resonance outside my library and, if so, what sort of feedback I might receive.
  3. Read the rest of this entry »

    If you enjoyed this post, make sure you subscribe to the RSS feed!


[ Editor's note: This article first appeared in the OSTI Blog. Dr. Walt Warnick, Director of the Office of Scientific and Technical Information, part of DOE, and I co-authored the article. For some important search applications there is no alternative to federated search.]

Discovery services have begun to appear in the search landscape. Discovery services provide access to documents from publishers with which they have relationships by indexing the publishers’ metadata and/or full text. Discovery services are marketed to libraries where patrons appreciate near-instantaneous search results and where library staff is willing to restrict access to sources available from the service (and optionally the library’s own holdings.) While these services tout themselves as improvements to federated search, the reality is that there is no alternative to federated search for a number of important applications.

WorldWideScience.org is a global gateway to science. The federated search application was conceived and developed at OSTI and hosted by us. The portal performs live federated search of 70 databases from 66 countries. Participating members provide access to their national research databases. For a number of reasons this important gateway to millions of research documents does not lend itself to the discovery service model.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!


Abe Lederman, founder and president of blog sponsor Deep Web Technologies, has been asked by partner Swets to speak at one of their Webinars about his experiences with federated search and with Deep Web Technologies’ Explorit - sold under the name SwetsWise Searcher. Abe will demonstrate the product together with Marieke Heins from Swets. The Webinar is free, live, and Abe will take questions. You can register here.

The Webinar is next Wednesday, May 12, 11.00 AM EST, and is open to an international audience.

Here is the topic of the Webinar:

With the amount of information available online rapidly expanding and residing in more disparate sources, you need to help your users simplify the way they discover and access the content they need.

Join our web information session and live demonstration on how SwetsWise Searcher can help you to provide your users with quick and relevant search results. In the Webinar, guest speaker Abe Lederman from Deep Web Technologies will share his experience with in the federated search field and how to accelerate the diffusion of knowledge. He has 25 years of experience in computer software engineering.

Here is more information about Abe’s talk:

Researchers, particularly students, are making Google their first stop for research because it is “quick and easy”. They assume that Google will find the authoritative, scholarly information that they are seeking. However, the information in Google is not always the highest quality or the most reliable content. Librarians now have the opportunity to team up with a Federated Search vendor to once again make the librarian a search authority in finding scholarly information. In this session, the audience will learn of the features and capabilities that are currently available in Federated Search. The audience will also learn how librarians can play a key collaborative role in bringing Federated Search to their patrons.

Free registration is on the Swets site.

If you enjoyed this post, make sure you subscribe to the RSS feed!


When I think of real-time search and automated retrieval I think of a federated search solution. Well, here’s a different kind of such a system. Evanced Solutions has built a robot (yes, an actual physical robot) that works like those in manufacturing plants. (No, the robot doesn’t look like the image here. This image is from the Wikipedia robot article.)

The U.S. designed and manufactured system allows libraries to provide books and audiovisual materials in convenient locations without the space and cost associated with constructing a traditional library branch or building.

The new library vending system will be powered by an industrial multi-axis robot typically used in manufacturing plants. The robot will deliver library materials to patrons from storage shelves in the machine. It also re-shelves those same materials to the machine when returned by the patron for check-out by the next person.

The press release, Robot Extends Library Services, says the prototype of its new BranchAnywhere library vending system was to be unveiled last month at the Public Library Association Conference in Portland, Oregon.

A hat tip goes to Stan at the Library Blog Buzz.

If you enjoyed this post, make sure you subscribe to the RSS feed!


Helen Mitchell, enterprise search consultant and one of our volunteer judges for this year’s Federated Search Blog contest, will be teaching a one-day course at SLA in New Orleans in June. Mitchell has over 30 years of experience in enterprise search. See her bio in this article (third one down.)

The course is divided into a morning and an afternoon piece. There’s a substantial discount for signing up for both parts.

Here are the descriptions for the two parts of the course:

Federated Search, Part 1: Evaluation and Assessment Methodology for Success
Saturday, 06/12/2010 8:00AM -12:00PM

In this age of “information explosion,” quickly finding the most relevant information is a huge challenge for information professionals (IPs). With a tidal wave of information technologies to choose from, IPs often lack the expertise to select the best solution to increase content findability. Consider a federated search (FS) system that can quickly search your subscription databases and unstructured content sources. Learn a methodology to evaluate, select, develop and implement the right FS solution for your organization. This can improve support of your mission, vision and goals.

Federated Search, Part 2: Selecting and Implementing an Effective Solution
Saturday, 06/12/2010 1:00PM - 5:00PM

Current and emerging search technologies can foster information sharing, collaboration, networking and feedback. Finding the most relevant information in a timely manner challenges information professionals because they don’t have an effective enterprise-wide federated search (FS) solution. If you want a better understanding of what federated search is, how to collect these specialized requirements, develop a “Request for Info” (RFI) and a “Request for Proposal” (RFP) and learn how to evaluate federated search products to meet your organizational needs, this course is for you!

Course price and registration information is here. SLA price and registration information is here.

If you enjoyed this post, make sure you subscribe to the RSS feed!