Often overlooked in conversations about federated search vs. Google crawling is that the difference between the two search approaches isn’t just whether the search engine has to get to content behind search forms or whether it follows links to build an index. Quality of content is also a major fundamental difference. A library patron expects all of the resources available to him or her to be of high quality, whether those resources are physical books and journals, or digital content. This same expectation holds true for resources presented by federated search engines, especially those used in academic, business, or scientific environments.

The Google relevance model is largely based on “authority,” which is based on popularity, which is NOT the same as credibility. Particular scientific findings may be published on the Web, not widely referenced, and not considered important to Google, yet they may be noteworthy and from highly credible sources. And, highly popular Web documents may be highly ranked by Google yet fall into the category of “pseudo science.”

From the Nihil Obstat blog I learned about an upcoming workshop on information credibility on the Web. The workshop description tells the problem:

As computers and computer networks become more common, a huge amount of information, such as that found in Web documents, has been accumulated and circulated. Such information gives many people a framework for organizing their private and professional lives. However, in general, the quality control of Web content is insufficient due to low publishing barriers. In result there is a lot of mistaken or unreliable information on the Web that can have detrimental effects on users.

The description goes on to tell that technology is needed to determine accuracy and trustworthiness of Web documents, among other characteristics. Given the explosive growth of Web 2.0 and user generated content, the need is greater than ever to separate the wheat from the chaff.

As an aside, the keynote speaker for the information credibility conference, Ricardo Baeza-Yates from Yahoo! Research, will be sharing interesting results “that show that user generated content in Flickr, Yahoo! Answers and Wikipedia is better than what can be imagined.”

The fact that there’s an entire conference dedicated to information credibility and that there are over 15 million Google hits for the unquoted words “information credibility” tells me that lots of people care about this problem, although the quoted phrase yields only 15,000 results.

Federated search goes a long way to solving, or bypassing, the information credibility problem. In the academic, business, and technical research environments where federated search engines are most likely to be found — i.e. not in the popular consumer-oriented metasearch engines — the content sources are all vetted and the chaff is left out.

In my book, credibility is what separates federated search engines from crawlers. I’m not saying that Google doesn’t deliver outstanding content. It does. It’s just hard to always tell which that is. If you want to know what other factors influence the quality of search results, I recommend this article.

If you enjoyed this post, make sure you subscribe to the RSS feed!


This entry was posted on Monday, March 2nd, 2009 at 4:59 pm and is filed under viewpoints. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

2 Responses so far to "On credibility of search results"

  1. 1 Jonathan Rochkind
    March 3rd, 2009 at 7:07 am  

    I think federated search in an academic environment is indeed useful for limiting the range of the search to scholarly materials, or more-or-less scholarly materials since it’s often hard to keep out popular magazines.

    But it’s no substitute for the need for patrons to be critical and information literate — there’s no way around this with technology, our users need to learn to evaluate the credibility and viewpoint of a source themselves, technology can’t do it for them. Even if federated search could (which it can’t), none of our users are ONLY going to be using federated search in their lives, we do them a dis-service if we think federated search avoids the need for critical evaluation on their part.

    And of course, the feature of federated search we’re talking about here is not limited to broadcast search technology. It’s about controlling the inputs to the search, but that can be done in a locally built index too, as we see with Google Scholar, or Serial Solutions Summon.

    I’d be interested to see an article on this blog talking about Serial Solutions Summon, and what it’s relationship and significance are toward the kind of federated broadcast search that focuses on scholarly/published materials.

  2. 2 Sol
    March 6th, 2009 at 7:23 am  

    Jonathan - agreed. We can’t protect users from the responsibility of critical thinking. And, yes, Serials Solutions Summon is on my list of things to write about.

Leave a reply

Name (*)
Mail (*)