Archive for July, 2009


Jonathan Rochkind published an interesting article at his Bibliographic Wilderness Blog that I wanted to raw your attention to. It leads with this question from the code4lib community:

I heard someplace recently that APIs are the newest form of vendor
lock-in. What’s your take?

Jonathan’s article and the discussion of the topic at code4lib raise some important questions:

  1. What’s in it for vendors to provide open (industry standard not vendor-specific) APIs?
  2. How can you tell if an API gives you freedom, or locks you in?
  3. Which vendors provide open APIs?
  4. What are the “right” set of requirements to go into an open API spec?
  5. How can you tell if a vendor has correctly implemented the open API functionality they claim to provide?

Read Rochkind’s article, check out the cod4lib discussion and consider this request from Marshall Breeding on July 22nd, which reads in part:

I am in the process of writing an issue of Library Technology Reports
for ALA TechSource titled “Hype or reality: Opening up library
systems through Web Services and SOA.” Today almost all ILS products
make claims regarding offering more openness through APIs, Web
services, and through a service-oriented architecture (SOA). This
report aims to look beyond the marketing claims and identify specific
types of tasks that can be accomplished beyond the delivered
interfaces through programmatic access to the system internals.


Technologist and blogger Brett Bonfield wrote an excellent article yesterday on the “In the Library with the Lead Pipe” blog regarding open source software in the library.

Here’s the opening paragraph:

It’s interesting how many people don’t really understand the concept of open source. People often describe freeware as open source, or they’ll describe free web-based applications as open source, or applications with APIs that allow for mashups. There are articles all the time, on some of the most popular websites, that recommend free software but don’t distinguish programs the authors gives away for free from software that is actually open source.

And, here’s my favorite statement:

Perhaps what people associate most closely with open source—free software—is its price tag. However, it is often pointed out that open source software is usually free like a puppy or a kitten: there may be no cost associated with acquiring it, but there’s more involved than just the initial cost.

Read the rest of this entry »


Discovery services have begun to spring up. This article is my attempt to catalog and characterize them. Consider this article to be an introduction that sets the stage for future analysis articles.

What is a discovery service?

A discovery service is a search interface to pre-indexed meta data and/or full text documents. Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration. I believe that hybrid “federated discovery” services are likely to prevail over pure discovery services and I will dedicate an article to them.

Read the rest of this entry »


Here are snippets from some articles I’ve tweeted in the last week:

Library Systems: Synthesise, Specialise, Mobilise

The business and service model is evolving from acquiring, cataloguing and circulating physical collections to synthesising, specialising and mobilising Web-based services. … The current generation of federated search systems, link resolvers, resource-sharing systems and electronic record management (ERM) systems are starting to address the new model; the approach, however, is somewhat piecemeal, driven by the identification of specific market opportunities.

AL Inside Scoop

Williams [,one of the presenters,] suggested that librarians are a shrinking market for publishers, who are moving toward individual customers. “End users are less fussy,” she said, noting that EBSCO underwrites NPR. Blyberg agreed but added that our front-end interfaces are advancing far beyond our back-end content, calling for better federated search.

Virtual Private Library and Deep Web (video)

99 percent of the trillions of pages on the Web are inaccessible to search engines. Learn how you can tap into this vast deep Web by using the innovative and cutting edge Virtual Private Library.

Top Technology Trends: July 2009

The genre of Discovery Interfaces has been an ongoing trend for the last few years. These interfaces aim to replace the traditional, stodgy OPAC with a modern interface, delivering library content through an interface more consistent with what patrons experience elsewhere on the Web. They offer visually appealing design, relevancy ranking, faceted navigation, and other standard Web navigation techniques. These products offer an attractive replacement for the online catalogs delivered with the ILS.


A shift

Author: Sol

The Federated Search Blog is a little over a year and a half old. The blog has been quite successful in reaching over 800 readers. But, something that has been nagging at me is the (self-imposed) pressure to publish frequently. I have usually maintained a pace of three articles a week, of varying lengths and degrees of depth. I’m shifting the focus of the blog to producing more well-researched in depth articles.

What it will take to publish more research-intensive articles is more time, which I have a limited amount of. I spend roughly eight hours a week on the blog. And, as you might imagine, the blog is not my only commitment. So, I’m going to rebalance those eight hours to give me time to focus on writing roughly one in depth article per week. I will also publish very short articles of noteworthy news - snippets if you will.

Read the rest of this entry »


Blog sponsor Deep Web Technologies has just published a whitepaper, “Next-Generation” Federated Search: Critical for Intellectual Property Research.

The whitepaper explains why “Next-generation federated search technologies are quickly becoming an essential and indispensable tool for attorneys, paralegals, expert witnesses, and owners of IP to create, protect, monitor and litigate their intellectual property portfolios.”

Larry Donahue, Deep Web Technologies’ Chief Operating Officer and Corporate Counsel, authored the whitepaper. Mr. Donahue is licensed to practice law in New Mexico and Illinois and is a registered patent attorney thus he very well understands the information needs of the legal profession.

Intellectual property litigation is but one field of law in which missing important documentation in preparing a case can be a very costly mistake in court to say nothing of the loss in credibility. The right federated search solution, configured to search all the relevant sources, can serve to sufficiently widen the net to avoid missing critical information while keeping the legal staff out of overwhelm.

At just two pages, the paper is a quick yet impactful read. And, of course, there are many industries outside of law in which the cost of missing information is high.


UC Berkeley Professor Marti Hearst has just completed Search User Interfaces, an academic book on the topic. Cambridge University Press will be releasing print copies in September but the full text is available online now for free.

The terms of service for the online version of the book does not permit posting any of its contents so, even though short excerpts from the book would probably be acceptable fair use, I’ll respect the terms of service and I won’t be quoting from the book in this article.

I don’t consider this series of articles to be a formal review of the book but more of a sampling of ideas I found interesting and instructive.

Read the rest of this entry »


I’m incubating a white paper about the Deep Web. The Deep Web is all that content (more than 99%) of the web that Google can’t find by crawling, right? It’s all that stuff that lives inside databases and can only be found by filling out forms, right? The main value add of Deep Web search engines is that they find only Deep Web documents, right? Not all that long ago I would have answered “yes” to all these questions. Today I’m confused.

Today I was chatting with Darcy from (blog sponsor) Deep Web Technologies’ marketing department about the white paper. I’ll refer to her as Deep Web Darcy. Well, Deep Web Darcy is asking me some rather “deep” questions about the Deep Web. We discussed harvesting, crawling, indexing, Deep Web searching, and so much more. If someone’s Deep Web content finds its way to Google has that content become surfaced and does that content no longer qualify as buried treasure? If one’s Deep Web content can be harvested, is it not really Deep Web content? If someone is browsing that content in the forest, with only one hand on the keyboard, does that content make a sound? So many koans. So little time. My brain hurts.

Read the rest of this entry »


I’m new to the term “data federation.” How about you?

Michael Bergman, federated search luminary, just wrote on the subject, preferring the term “data mixing.” He explains the concept:

What is Data Mixing and Why is it So Hard?

As a new term there is no “official” definition of data mixing. However, I think we can consider it as generally equivalent to the older data federation concept.

Data federation is the bringing together of data from heterogeneous and often physically distributed data sources into a single, coherent view. Sometimes this is the result of searching across multiple sources, in which case it is called federated search. But it is not limited to search. Data federation is a key concept in business intelligence and data warehousing and a driver behind master data management (MDM).

Read the rest of this entry »


My fur was raised when I saw Serials Solutions’ claim that their discovery service was an evolutionary step beyond federated search. I raised my concerns a couple of times: here and here. My beef isn’t with Serials Solutions as a business, it’s with their position that it’s fine to not search content that they don’t provide access to. There’s no room (yet) in their discovery service model to include access to quality content that can only be searched live, i.e. via federated search. Carl Grant joined the conversation and various people commented, making the topic a very lively one.

My concern was, and is, that libraries and research organizations would consider giving away their responsibility to select quality sources for their patrons for what I imagine to be two primary reasons: (1) library patrons don’t like to wait 30 seconds for federated search results, and (2) (possibly) cost savings. I don’t have a lot of sympathy for the Google generation. Even though I’m an American and my culture has taught me that immediate gratification is a good thing I think 30 seconds is a small price to pay to see better results. Cost I can’t speak to as I don’t have any figures.

Read the rest of this entry »