Jan
This is the fourth part in a series of articles about standards for accessing content from document sources, particularly in the context of federated search. You can access the first three parts of the series from the following links:
- Content access basics - Part I - screen scraping
- Content access basics - Part II - XML
- Content access basics - Part III - OpenSearch
SRU, SRW, and Z39.50 can be very confusing acronyms to the uninitiated but there are only a few concepts that you need to understand about these standards to have an intelligent conversation about them.
Let’s begin with an introduction to Z39.50 since it is the oldest of the three standards and certainly more well known to the librarian community. Wikipedia has a good overview of Z39.50. Here are the key points to understand about Z39.50.
- Z39.50 is a standard that defines how a client (typically a search application) can communicate with a database of content to perform information retrieval, i.e. to search for and retrieve documents.
- Z39.50 is an old standard, several decades old. It was developed before Web 2.0, XML, and web services were created.
- Despite its age, Z39.50 is still very widely used in library environments, especially in cataloging and bibliographic reference systems. Z39.50 reminds me of the programming language COBOL. COBOL is older than Z39.50, many people complain about its arcaneness and failings yet there is a very large base of very large COBOL applications that is not going away any time soon.
- Z39.50 lends itself to federated search because once a federated search application developer creates one connector to access a Z39.50 content source subsequent connectors are quick to build and deploy since they all follow the same standard for search and retrieval.
Now, let’s look at SRU and SRW. SRU stands for “search/retrieve via URL” and SRW stands for “search/retrieve via Web.” A good summary of SRU/SRW is available at TechEssence.Info. A more in-depth summary of SRU/SRW is in this article.
Here are the basics of SRU and SRW.
- SRU and SRW, like Z39.50, are standards for information retrieval.
- SRU and SRW are essentially the same protocol. The only difference is that SRU accesses a content source by specifying search commands via a URL while SRW uses modern Web Services to make the same request. Web Services are not an easy concept to understand for the uninitiated and good non-technical introductions are scarce. Here is a reasonable introduction.
- SRU/SRW is a modern version of Z39.50.
- SRU/SRW makes use of HTTP, XML, Web Services, and other modern web-based technologies.
- SRU/SRW is “stateless” thus simpler than Z39.50. The Z39.50 client (search application) makes a telnet connection with the Z39.50 server and has a small dialog with the server to submit a search request and retrieve its results. The server and client need to manage the “state” of this connection. SRU/SRW, being stateless, has no such dialog. It submits a single request using either a URL or Web Services, and receives in response a single XML document.
- SRU/SRW uses an intuitive and very expressive query language called Common Query Language (CQL).
- SRU is simple enough to be used with modern browsers. A user who learns the syntax and semantics of the SRU query syntax (CQL) and who knows what fields are supported by the remote server, can submit a search request via a browser and view the XML results formatted by his or her browser.
- Like Z39.50, there are numerous SRU/SRW content sources. Building the first connector is the hard part. Subsequent ones are very straightforward and relatively quick to develop.
- Because SRU/SRW are built on Web Services they benefit from the Web Services mechanism that allows an application to query the content provider to determine what fields it provides plus other information about how to search the remote database. Thus, a federated search engine SU/SRW connector can easily get meta data about a source without human intervention required to build the connector.
- SRU/SRW is related to OpenSearch. The query languages are different and while OpenSearch returns its search result information in XML it does so as an RSS document.
Now that you have a sense of what Z39.50, SRU, and SRW are about let’s consider the relevance of these standards to federated search. Here are the key points:
- There is a huge number of information sources that speak Z39.50 that are of great interest to the library community that federated search vendors want to serve. Just like it’s not wise to ignore screen scraping as an approach to accessing content, it’s also not wise to ignore the large number of Z39.50 sources, arcane as the interface may be.
- There are also many SRU/SRW content sources in existence. Just a few days ago I wrote about the OpenTranslators announcement that WebFeat was providing access to 10,000 Z39.50/SRU/SRW sources. While I’m not sure how WebFeat is counting these connectors the point is that there are many of these connectors available and I believe that all federated search vendors should be supporting content sources that utilize these standards.
- While it can be awkward for existing federated search applications to access Z39.50/SRU/SRW content due to the architecture of their search mechanisms, there are a number of applications and development kits that facilitate the development of Z39.50/SRU/SRW clients and servers. Many are free. Index Data, in particular, distributes YAZ tools and applications which allow for creation of Z39.50 proxy servers and gateways from Z39.50 sources to SRU/SRW gateways. So, federated search engine developers and content providers have a number of powerful tools to assist them in building interfaces to these popular standards.
- In his year-in-review post Abe wrote that Microsoft has announced that it’s giving away Microsoft Search Server 2008 Express, which provides federated search capability using OpenSearch.
- As I wrote in my review of the Education Institute’s federated search web conference, trends in the federated search industry include progress on standards and XML syndication of content, both of which are relevant to SRU/SRW.
In summary, Z39.50/SRU/SRW are prominent standards used by a large number of information providers, especially in the markets in which federated search vendors sell. As the federated search industry evolves to become the provider of the “search everywhere” application, vendors will need to provide access to a wider variety of content from a single search page, even if proxies and gateways are used behind the scenes to access that content. “Seamless” is what the users want to see and what the vendors will need to provide.
Tags: federated search, sru, sru/srw, srw, z39.50
2 Responses so far to "Content access basics - Part IV - SRU/SRW/Z39.50"
March 26th, 2008 at 10:39 am
So what standard works best for the federated search tools? How can one define best? Which standard is likely to provide the easiest results for the fed. search vendor to parse?
May 13th, 2008 at 4:26 am
…and which interface should a content provider implement who is new to this field?
I have a website written in ASP.NET which needs to support Federated Search. SRU/SRW looks like the most sensible to me at the moment.