23
Dec
Multilingual federated search is a big deal for a couple of reasons. First, no one has done it up to now. Yes, Google just added translation into its universal search. And, no doubt Bing will follow suit. But, being able to search the quality scientific and technical information that sometimes is only available via federated search, and doing it in foreign languages, is important.
The second reason that multilingual federated search is so important is because China, Japan, Russia, and other nations produce large volumes of research output. As the world shrinks we can’t afford to ignore the non-English literature. In a blog article the author noted that Thomson Reuters highlighted the importance of China’s research output on the basis of sheer volume :
According to citation analysis based on data from Web of Science, China is ranked second in the world by number of scientific papers published in 2007. Scientific’s World IP Today Report on Global Patent Activity 2007 reported that China almost doubled its volume of patents from 2003 to 2007, and looks set to become a strong rival to Japan and the United States in years to come.
The bottom line: federated search is about research and research is global.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
19
Jul
Discovery services have begun to spring up. This article is my attempt to catalog and characterize them. Consider this article to be an introduction that sets the stage for future analysis articles.
What is a discovery service?
A discovery service is a search interface to pre-indexed meta data and/or full text documents. Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration. I believe that hybrid “federated discovery” services are likely to prevail over pure discovery services and I will dedicate an article to them.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
11
Jun
Product manager for Blogs.com and lead for their blogger training, Andy Wibbels, wrote an outstanding blog article, “What is an API?” As a programmer I know what an API is but I have a hard time explaining the concept to non-programmers. Now, Andy has done the explaining for me.
Andy’s article does a nice job of explaining without overwhelming, and his short introduction skillfully avoids going into more detail than most people want. If you’ve ever wanted to explain (or understand) the connection between mashups and APIs, or how Twitter’s massive and rapid success can be attributed to embracing APIs, then this is the article for you.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
22
Apr
Last November I wrote about Matt, a software developer and graduate student in computer science. Matt had blogged about a deep web crawler he was building. Five months later, I’m curious to know how you’re doing, Matt. Please let us know if you’ve done more work on your crawler since your last post mentioning the subject on November 21.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
12
Jan
Web 2.0 is a fascination of mine. I’m very community oriented and I’ve watched the computer industry evolve over the past nearly thirty years. I’m very excited about the potential for people and computers to change the world and to help solve our most pressing problems.
Lorcan Dempsey took a look at O’Reilly’s “Programming Collective Intelligence” and he inspired me to look at the book as well. I blogged about Lorcan’s blog article and was able to get a review copy of the book.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
21
Oct
I’m about to add a little bit of AJAX to some software I’m writing so I’m reading up on the technology. So, when my Google alert for “federated search” turned up 5 things you didn’t know about AJAX I was curious to read the article, which appeared in the blog for the Task Force on Social Networking Software for the Medical Library Association. Read my previous article, The interplay between AJAX and federated search, for background material on the intersection of the two subjects. That article introduces AJAX:
AJAX stands for Asynchronous JavaScript and XML. Asynchronous refers to the fact that a program using AJAX can request an update to bits of a web page without having to reload the entire web page. JavaScript provides the mechanism that the web page uses to communicate with the HTTP (web) server. XML is the standard that is sometimes, but certainly not always, used to encode the data given to the web server. AJAX is basically a set of standards and techniques that a web programmer can use to create HTML-based web applications that are browser-independent where parts of the page refresh smoothly without requiring entire page reloads.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
24
Sep
[ Note: Two very huge prime numbers were discovered recently, one last month, the other early this month. These primes, known as Mersenne primes, were discovered via a "divide and conquer" approaches, validating the distributed search approach for fields as distant as federated search :)]
Jonathan Rochkind and Stephan Schmid left comments in response to my article on federating large numbers of sources. I’d like to respond to a part of Jonathan’s comment as well as to a piece of Stephan’s comment.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
19
Sep
This week the Science.gov Alliance released Science.gov 5.0. The release got a good amount of press from a couple of press releases (from the US Department of Energy and Deep Web Technologies) and from a number of bloggers, including Valerie Allen (Product Manager for Science.gov) on OSTI’s own blog and the SLA Government Information Division blog. When I worked for Deep Web, I supported the application and have enjoyed watching it evolve and grow.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
16
Sep
Abe and I were recently discussing the federation of large numbers of sources and the question came up: “What would it take for a single application to federate hundreds or even thousands of sources?” The conversation turned to a discussion of an approach that this blog’s sponsor, Deep Web Technologies (DWT), had developed to federate a number of federated search applications. The discussion of this “divide and conquer” approach inspired this article. You can read more about the ideas discussed here in two of Abe’s presentations:
I should note that DWT’s approach is not the norm and that large source scalability is not something that many customers need to be concerned with today. But, I do believe that we’ll be seeing more federated search applications searching greater number of sources in the years to come.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!
9
May
If you follow this blog you know that I rarely write about metasearch engines. It’s not that I dislike them, there’s just too many of them out there, it would be hard to keep track of them all, and few capture my attention. Plus, even though metasearch engines are federated search applications in their own right — they aggregate search results in real time from a number of sources (which may consist of live or crawled and indexed content) — I mentally place them in a category of their own.
Last December I wrote about Rollyo, a personal search engine that you can customize with a list of URLs to search. While one could argue that Rollyo is not a federated search application (it’s got to be searching crawled and indexed content rather than live sources if it searches arbitrary web-sites) I found it to be innovative enough to warrant a post. Addict-o-matic (hat tip to Web Worker Daily) is another metasearch engine that intrigued me.
Read the rest of this entry »
If you enjoyed this post, make sure you subscribe to the RSS feed!