At the 1999 American Society for Information Science and Technology (ASIS&T) Annual Meeting, Dr. Matthew Koll delivered a presentation titled “Major Trends and Issues in the Information Industry.” In a set of post-presentation notes, Dr. Koll made some powerful statements that, nearly 10 years later, still resonate with search in general, and with federated search in particular.
Koll defined information retrieval in a novel and elegant way:
Information retrieval is the science and practice of trying to show people the document they would want to see next, if they had total knowledge and hindsight.
As an interesting aside, Koll was one of the first people to use the term invisible web, to describe the documents within the Internet that are behind search forms and aren’t found by Google and the other conventional search engines. Wikipedia, in its article about the deep web, synonymous with the invisible web, has this to say:
Michael Bergman has said that Jill Ellsworth coined the term “invisible Web” in 1994 to refer to websites that are not registered with any search engine. Bergman cited a January 1996 article by Frank Garcia in which Ellsworth was quoted using the term (but did not say she coined it in 1994):
“It would be a site that’s possibly reasonably designed, but they didn’t bother to register it with any of the search engines. So, no one can find them! You’re hidden. I call that the invisible Web.”
Another early use of the term invisible Web was by Bruce Mount (Director of Product Development) and Matthew B. Koll (CEO/Founder) of Personal Library Software, Inc. (PLS) when describing the @1 deep Web tool. The term was used in a December 1996 press release from PLS.
The first use of the specific term deep Web occurred in that same 2001 Bergman study.
Wikipedia provides a link to a Wayback machine snapshot of the 1996 PLS press release.
Koll takes the “finding a needle in a haystack” metaphor and applies it in a fun way to problems users have with search systems. He provides 12 meanings to the metaphor:
- a known needle in a known haystack;
- a known needle in an unknown haystack;
- an unknown needle in an unknown haystack;
- any needle in a haystack;
- the sharpest needle in a haystack;
- most of the sharpest needles in a haystack;
- all the needles in a haystack;
- affirmation of no needles in the haystack;
- thinks like needles in any haystack;
- let me know whenever a new needle shows up;
- where are the haystacks?; and
- needles, haystacks – whatever.
It’s fun to apply some of these “search desires” to federated search today.
If the needles represent relevant documents then finding all the needles is akin to high recall. Finding only needles (no hay) is akin to high precision. Oddly enough, high precision is not one of the 12 meanings cited by Koll. Note: To understand precision and recall see the Wikipedia article.
Letting users know when new needles show up is an alert system that sends users email notifications when new documents matching their stored queries are found.
Several of the meanings relate to current information discovery problems. Who’s ever had to face “an unknown needle in an unknown haystack?” That would be the researcher or student who’s not quite sure what he’s looking for and doesn’t quite know where to find it either. Source discovery is the science of answering the question, “where are the haystacks?”
So, can you draw parallels between others of Koll’s meanings and federated search today?
In case you were wondering about that 12th line, “needles, haystacks – whatever,” Koll explains:
The “needles, haystacks – whatever” line started off as a light-hearted poke at Gen-X searchers, but with the massive growth in consumer online searching, this now represents a legitimate viewpoint. Casual searchers don’t have time for a lot of interaction and aren’t going to give the system a lot of words to work with; they want some good information back fast, and if they don’t get it they’re going to take their business elsewhere.
Gen-X searchers, massive growth in consumer searching, underspecified (too short) queries, impatient users — sound familiar? Some things never change.