John Blossom, President and senior analyst for Shore Communications, writes a compelling article in Shore’s ContentBlogger that is a must-read. Beyond Search Engines: The Database is Now is more than a catchy title; it’s one of those articles that portends a powerful paradigm shift in the search industry that is already well underway.
Blossom’s message to search vendors, especially federated search vendors, is simple and clear: Don’t focus just on the content from known sources you make available to users; Get good at mining useful information from sources whose structure is not simple and is ever-changing. With the exponential growth in the volume of content, it is no longer sufficient to bring together search results from a number of databases from a list and be smug in the feeling that we are on the leading edge when we provide federated search. An important, but not the most important question, as Blossom states in his article, is “[how] to deal with organizing and delivering content when the Web and many private content collections measure in petabytes and exabytes of information.” I think that many of us in the federated search industry believe that finding the most relevant handful of documents in the proverbial haystack is sufficient but in the context of organizing and delivering data that leads to actionable intelligence I believe the industry has a long way to go.
Recognizing that we need to develop more efficient means of mining large quantities of content, in particular content of varying structure, Blossom asks an even more important question:
Namely, as the problems that people need to solve with content technologies become increasingly complex and increasingly fleeting, why is it that we really need permanent unified databases to solve those problems? There is an important need for data normalization, but if normalization can be achieved “on the fly,” as leading content federation services can provide, do people need a database or instead data objects that solve specific problems in the moment?
Blossom is asking us to pay attention to the large number of what he refers to as “on-demand content services”, namely the aggregation and distribution of content compiled and organized from a number of unstructured and disparate sources. These on-demand content services become the new databases. The rapid growth of these service is being fueled by the large and growing number of aggregation, filtering, and distribution protocols and tools. Think mashups, pipes, XML, and RSS. What would it take to intelligently mine the content produced with these services? How can tomorrow’s federated search engines determine, with perhaps little context, the quality of one of these on-demand content sources?
The questions raised in Blossom’s article are difficult ones but they are also exciting ones. For the federated search industry to evolve to the next level we are going to need to grow beyond the notion that databases and their structures are static entities. We are going to need to embrace these “just-in-time” and uniquely structured databases and learn to analyze them, hopefully in real-time, to extract the unique value in any particular content service. And, we’re not always going to get help from the service’s creator. Exciting!