3
Apr
We’ve all heard the old adage, “Don’t believe everything you read.” The Internet is full of stuff to read; how do we know what to believe? While there are numerous search engines that present us with documents in response to our queries, how do we know if the information presented in these documents is accurate? Granted, much of what’s in the Internet is personal opinion and sometimes all we want is someone’s viewpoint. There are times, however, when we need to know that the information we are reading is of high quality. We may be researching product features to make a purchase decision, company information to form competitive intelligence strategy, or medical information to address a medical concern.
A major part of the answer to the question of whether information is accurate or not is to examine its source. This is where federated search engines really shine. By their nature, federated search applications usually query deep web database sources. The databases can’t be crawled. There are no links for Google to follow to extract all documents in such a database. Now, let’s consider the type of content that lives in these non-crawlable databases. Publishers who specialize in scientific, technical, and business research articles are most likely to store their documents in databases and to make their content searchable by federated search engines. Geological, geographic, demographic data lives in databases. Much political data lives in databases as well.
Read the rest of this entry »
1
Apr
In, what industry analysts consider a bold move, Google has decided to stop crawling the web. Google management claims that the rapidly rising costs of purchasing and maintaining tens of thousands of index servers is what’s pushing it to rethink its approach to dominating Internet searching. Industry experts claim there are other pressing concerns driving Google to abandon its crawl and index approach.
Read the rest of this entry »
31
Mar
I’ve been remiss in responding to recent blog comments. This blog’s readership has a number of insightful people and I would like to acknowledge, and respond to, some recent comments.
Peter Murray commented on the challenges of incremental results. He made the point that trying to get users to do things differently than they’re used to shows that you’ve not designed your product properly. I get the point but federated search isn’t Google, and serious research isn’t Google. I think it’s ok for some software to be complicated enough that it’s worth training users to understand how it works. Google is easy but it has limitations. There are entire businesses devoted to software training and documentation.
Read the rest of this entry »
28
Mar
The first thing that most people notice when they use a federated search application is that it’s not nearly as fast as Google. We’ve all gotten spoiled. This is not only the information age, it’s the age of quick information; we all want every search to be as fast as a Google search. However, by its very nature, federated search can’t be as fast as Google. Federated search is at the mercy of the sources it federates. If a source is slow to return results to the federated search application, then there’s nothing the federated search application can do, or is there?
Deep Web Technologies has been displaying incremental results for some time now. The idea is simple: display results in chunks as they are received from the sources being searched. Science.gov, WorldWideScience.org, and Scitopia.org are three applications that display incremental results. While there are challenges to this approach, there are some significant benefits as well. The aim of displaying incremental results is to minimize the time the user has to wait to see some results. In the show-something-quick department, incremental results works well. The major challenge arises when you try to figure out what to do with the rest of the results as they come in.
Read the rest of this entry »
26
Mar
The U.S. Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) has, ever since the Manhattan project, been responsible for stewardship of DOE-related research results, which it makes available for free to scientists, researchers, and the public. The OSTI blog was started last November to share personal perspectives of OSTI employees. Recently, the blog was expanded to include a technology thread. OSTI’s use of technology, much of it based on federated search, should be of interest to readers of this blog.
Due to my familiarity with OSTI technology (from five years of helping to develop and support OSTI products through my relationship with this blog’s sponsor, Deep Web Technologies), I was asked to write for the technology thread, being the sole author of some articles and collaborating author on others.
Read the rest of this entry »
24
Mar
This blog started last December 3rd with a welcome post. The blog will soon be four months old. I thought it would be fun to see what posts have been the most popular, in terms of views, so I looked at the Wordpress stats for the blog. Here’s what I found:
Read the rest of this entry »
21
Mar
The e-resources@ uvm blog has a post this morning that, among other things, said this:
The closing speaker, Tom Wilson (University of Alabama), briefly made a point about Google that I really liked, and that led to discussion afterwards. He pointed out that Google is not a federated search engine: it uses relevancy ranking (maybe well, maybe not well) and federated searches can’t. Federated search engines are, by nature, multiple databases, and can’t apply relevancy like Google can with its single database. I had never thought through to that point, and I think it’ll be on my mind for the plane ride home.
This statement really caught my attention because it’s wrong. I worked at Deep Web Technologies (this blog’s sponsor) for five years and know their technology pretty intimately. Deep Web puts a tremendous amount of effort into doing relevance ranking. Most other federated search vendors provide relevance ranking as well.
Read the rest of this entry »
19
Mar
MIT Technology Review published an article, Searching as a Team: An innovative tool aims to help users search the Internet together, which describes an experiment in collaborative searching using social software. Meredith Morris, of Microsoft’s Adaptive Systems and Interaction group, is designing the software. Here’s how the article describes the tool:
Read the rest of this entry »
18
Mar
Last December I wrote about a forthcoming book: Federated Search: Solution or Setback for Online Library Services. (Ignore the links in my earlier blog post; Haworth Press is now part of Taylor & Francis and many Haworth Press links no longer work.) The book is available now and, by courtesy of Taylor & Francis, I have three review copies to give away.
What I am going to do is to send copies of the book to the first three people who email me at the address in the blog’s About page and who commit to reviewing three or more chapters of the book for publication in this blog. I will acknowledge you as guest author and link to your blog or web-site if you like. The book is a compilation of articles so you don’t need to read the whole book to review several of the articles. Once you receive your book I will ask you to select several articles that you wish to review and I may ask you to change a selection or two as I will coordinate among the three reviewers to ideally have no overlap.
Read the rest of this entry »
17
Mar
I was intrigued when I saw this blog post from the littera scripta blog. The post references this article, from the Online Education Database, which lists the top 25 librarian blogs, scored by Google PageRank, Alexa Rank, Technorati Authority, and Bloglines Subscribers.
Wanting to stay current re discussions of federated search in the blogosphere, I was curious to do my own ranking of these blogs to see which ones referred to federated search the most. So, I went down the list and searched for the quoted phrase, “federated search”, in each of the 25 winning blogs. I realize that this is not a very rigorous assessment; in particular I’m aware that “distributed search”, “metasearch”, and other terms are often used to mean the same thing. I also realize that number of hits is not the best indicator of relevance; a new blog might refer to federated search a lot yet not have as many total hits as an established blog that only occasionally refers to federated search. Yes, this is not a scientific study.
Read the rest of this entry »