Jeffrey Beall from the University of Colorado Denver, has a nice slide presentation: The Shortcomings of Full-Text Searching.
The slide show lists 14 problems one encounters with search engines. Here’s the list:
- The synonym problem. You search for “dentures” but don’t think to search for “false teeth.”
- Obsolete terms. You’re researching the history of motion pictures and don’t think to search for “photoplay.”
- The homonym problem. Your search engine doesn’t do clustering and you search for “conductor.” Or, you search for “Roger Morris” and find the wrong one. Or, you search for “red,” which means “network” in Spanish.
- Spamming. There’s lots of junk in the indexes of the big search engines to make your searches less effective.
- Inability to narrow searches by facets. Clustering and search refinement doesn’t exist in all search engines.
- Inability to sort search results. It can be hard to organize results.
- The aboutness problem. Just because the result has your terms in it doesn’t mean the result is actually about the term.
- Figurative language. You search for information about “drowning” and find a document about someone “drowning in birthday presents.”
- Search words not in web page. There is supposedly a book about the French Revolution that does not use the term “French Revolution.”
- Abstract topics. How do you find useful document on “health,” “free will” or “ethics?”
- Paired topics. Art and mental illness, architecture and philosophy, and movies and fascism are examples of paired topics. Often search engines find documents with both terms but the terms are not related, they just happen to appear in both documents.
- Word lists. You’re searching for a term. What you find is a word list that contains your term but has nothing to do with your term.
- The Dark Web. That’s the Deep Web. Lots of quality information is in the Deep Web and not accessible to Google and the other crawlers.
- Non-textual things. Without meta data or tagging non-text data is very difficult to find.
What’s Beall’s conclusion? Search the library databases directly. I’m confused because searching library databases IS performing full-text search. I think Beall is focusing on the Surface Web search engine (Google and Bing, for example) as the major sources of the problem. To some extent searching sources directly or via federated search can overcome these problems depending on how scholarly the content is, how good the meta data is, and how good the underlying search engines are.
Hat tip to the PurpleSearch Blog.
Tags: federated search