Archive for the "basics" Category

28
Feb

I recently got this question:

I’m new to federated search. You’ve written lots of articles (too many) about the subject. Can you give me a half dozen articles to read that would get me oriented? Oh, and if you would tell me what order to read them in that would be great too!

I took this request to heart and came up with my ordered list of basic articles. The list has 15 articles. Yes, the request was for six. My only defense is that many of the articles are a quick read and, since my list is in order, you can read just the first six and you’ll know a lot more than when you started.

I organize my list into three sections: federated search/deep web, discovery services, and federated search in the enterprise. I think everyone new to federated search needs to have an awareness of all three areas.

Here’s my list:

Federated Search and the Deep Web

Discovery Services

Federated Search in the Enterprise

14
Jan

I’ve written a number of educational articles about federated search for this blog and for other venues but I had always skirted around the issue of explaining what exactly federated search is. Finally, I’ve written a primer for people who are comfortable with search engines but want to understand what federated search is all about.

AltSearchEngines published the primer in three parts: Part I, Part II, Part III.

Read the rest of this entry »

30
Jun

This article is the second in a series intended to help those exploring federated search to understand the steps to procuring a federated search product or service Part I explored the business case, i.e. the compelling reasons for pursuing federated search. While the exploration of the issues in Part I may have included discussion of benefits, the discussion was a high-level one. Part II drops to a deeper level of detail. Part III will consider features in the context of requirements and benefits.

Note that benefits are not features. While I am not a sales or marketing professional, my experience in working with customers is that they desire benefits and that they only desire the features that increase their benefits. Too many features can actually make an application undesirable if, for example, ease of use is an important benefit and the weight of all those features negates that benefit.

Read the rest of this entry »

23
Jun

A couple of weeks ago I wrote about ten unrealistic expectations that some people have about federated search. A few days ago The Krafty Librarian published a blog article expressing frustrations over PubMed having been down for a number of hours with no notification from PubMed. In my book, this makes for unrealistic expectation number 11:

11. If a source ever goes down, the content owner should immediately and widely broadcast this information.

Read the rest of this entry »

30
May

This roadmap series raises and addresses a number of issues that prospective customers of federated search solutions will encounter. It is based on the outline I published last week; I’ve made a couple of changes to the outline to reflect reader input. Here is the new outline:

Read the rest of this entry »

23
May

For many readers of this blog, I suspect that the roadmap to a federated search solution is not clear. How do you get from “We need federated search” to a deployment that meets your organizations needs and that you can feel proud of? Below I present an outline of a process I believe can help you to organize your thinking, your planning, and your actions towards the goal.

Note that the outline below is just that, an outline. The value to you will come during the next several weeks as I discuss, in individual blog articles, each item in the roadmap.

Read the rest of this entry »

12
May

I’ve had discussions recently with Abe Lederman and Darcy Pedersen regarding how organizations go about selecting a federated search vendor. (Darcy runs marketing for Deep Web Technologies, this blog’s sponsor, and Abe is my brother and runs the company.) I wanted to know how Deep Web’s customers went about choosing a vendor, whether it be Deep Web or one of its competitors. What questions did prospective customers ask? What were their concerns? How did they formulate requirements? How did they conduct pilots? How did they ultimately evaluate vendors?

To synthesize our discussions, Darcy drafted a checklist of questions. I’m not ready to post the list yet because I’m interested in fleshing it out a bit more and I want other input to improve and extend the questions, hence this post.

Read the rest of this entry »

22
Apr

In January, I wrote a primer about clustering. I explained that:

… clustering is the automatic organization of search results into sets of results that have something in common. Some search engines and some federated search engines provide clustering features.

I also introduced faceted search, also known as faceted navigation:

This technology guides a user to relevant content by organizing search results in a hierarchical structure and providing labeled choices of paths in the hierarchy to follow. A faceted search system might have a series of pulldown menus that guide a user from the broad category of “Iraq” to “Iraq -> Geography”, to “Iraq -> Geography -> Maps” to “Iraq -> Geography -> Maps -> Baghdad.” Endeca is one vendor that provides faceted searching.

Read the rest of this entry »

7
Apr

Even if you’ve never heard of AJAX, you’ve very likely experienced it. AJAX is the technology that makes pieces of web pages update smoothly. Google Maps is a prime example of AJAX in action. You give Google a street address, it draws a map, and you can drag the map in all directions. Without AJAX, if you wanted to “drag” the map, you’d have to click a button to shift the map east, west, north or south. Then, the whole web page would redraw itself. The map navigation process is much smoother thanks to AJAX.

AJAX stands for Asynchronous JavaScript and XML. Asynchronous refers to the fact that a program using AJAX can request an update to bits of a web page without having to reload the entire web page. JavaScript provides the mechanism that the web page uses to communicate with the HTTP (web) server. XML is the standard that is sometimes, but certainly not always, used to encode the data given to the web server. AJAX is basically a set of standards and techniques that a web programmer can use to create HTML-based web applications that are browser-independent where parts of the page refresh smoothly without requiring entire page reloads.

Read the rest of this entry »

28
Mar

The first thing that most people notice when they use a federated search application is that it’s not nearly as fast as Google. We’ve all gotten spoiled. This is not only the information age, it’s the age of quick information; we all want every search to be as fast as a Google search. However, by its very nature, federated search can’t be as fast as Google. Federated search is at the mercy of the sources it federates. If a source is slow to return results to the federated search application, then there’s nothing the federated search application can do, or is there?

Deep Web Technologies has been displaying incremental results for some time now. The idea is simple: display results in chunks as they are received from the sources being searched. Science.gov, WorldWideScience.org, and Scitopia.org are three applications that display incremental results. While there are challenges to this approach, there are some significant benefits as well. The aim of displaying incremental results is to minimize the time the user has to wait to see some results. In the show-something-quick department, incremental results works well. The major challenge arises when you try to figure out what to do with the rest of the results as they come in.

Read the rest of this entry »