Discovering discovery services | Federated Search BlogFederated Search


19
Jul

Discovery services have begun to spring up. This article is my attempt to catalog and characterize them. Consider this article to be an introduction that sets the stage for future analysis articles.

What is a discovery service?

A discovery service is a search interface to pre-indexed meta data and/or full text documents. Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration. I believe that hybrid “federated discovery” services are likely to prevail over pure discovery services and I will dedicate an article to them.


It’s useful to note that discovery services aren’t new. IngentaConnect makes 4.5 million documents searchable from over 13,000 publishers. Infotrieve provides a document search and delivery service. And, there’s Thomson Reuters’ Web of Science. These are just three examples of discovery services that have existed for a long time. What is new about the recently introduced discovery services is the focus on integration with other content, typically the library’s OPAC. I’ll discuss integration in a separate article.

What is a unified search index?

The terms “unified index” and “unified search index” are associated with discovery services. Just as the terms imply, discovery services use a unified search index to search content from all sources they have access to from a single index. The discovery service must deal with differences in the structure of meta data (e.g. names and contents of fields) from different sources to produce the unified search index.

What is the motivation for discovery services?

In a word, speed. It’s no surprise that users don’t like to wait tens of seconds for their search results. In terms of response time, live searching can’t compete with index searching. A second factor driving the creation of discovery services is the willingness of publishers and content aggregators to form partnerships with developers of the services. Given the pressure to deliver search results in “Google time,” publishers have an incentive to cooperate with one another and with discovery service providers.

Some people say that a third driving factor is cost. While it’s possible that libraries could save money accessing sources via discovery services vs. via federated search, cost figures are very difficult to come by for either so cost may or may not, in reality, be a factor.

Another reason for the big interest in discovery services is that the onerous task of building, monitoring, and repairing connectors disappears since there are no connectors.

Unified indexes provide benefits due to their “homogenization” of meta data. Duplicates should be much easier to remove via discovery services than by federated search engines. And, discovery services will produce more “complete” results, i.e. results with titles, authors, publications dates and other fields of interest that federated search can’t reliably get. With better fielded results it will be easier to cluster and otherwise organize search results.

A potential benefit, but also a potential concern, is relevance ranking. It may be better or worse with discovery services depending on how search is performed. See the next section for further discussion.

Are there downsides to discovery services?

Yes - source lock-in. I’ve written, perhaps ad nauseam, about my concern that discovery services, if not integrated with federated search, force organizations that want a single search tool to choose one service or the other. Federated search is very important for organizations that have particular sources they want to search that are not available from one of the discovery services.

Even if an organization is happy with the set of sources provided through a discovery service, the availability of sources is dependent on the relationship with the publishers (and/or aggregators.) Discovery services are too new to know how publisher relationships will evolve, especially given the competition.

It’s also not clear how discovery services perform search. Let’s say that a particular discovery service has an index that’s built from meta data of its documents and not from its full text. In that case searching the index won’t produce results that are as relevant as results obtained by searching the native source, assuming the native source provides full-text search capability.

Another concern with discovery services is how current their indexes are. When one searches a source via federated search, the content is current because it is searched live. It’s not clear how frequently the discovery service indexes are updated.

The Oregon State University (OSU) Libraries evaluated WorldCat Local and other discovery services and recommended further evaluation and testing. See the OSU Libraries report and the “New Discovery Tools” article for more information. Links are in the references section.

Who is providing discovery services?

Company/Organization: EBSCO
Product: EBSCOhost Discovery Service
Product Web-Site: http://ebscohost.com/thisTopic.php?marketID=1&topicID=1245
Comments: See Library Journal article. See also EBSCOhost Integrated Search.

Company/Organization: Ex Libris
Product: Primo Central
Product Web-Site: http://www.exlibrisgroup.com/category/PrimoCentral
Comments: See press release.

Company/Organization: Innovative Interfaces
Product: Encore Discovery
Product Web-Site: http://encoreforlibraries.com/products
Comments: See Library Technology Guides article.

Company/Organization: University of Virginia Library
Product: Blacklight
Product Web-Site: http://www.lib.virginia.edu/digital/resndev/blacklight.html
Comments: Uses Solr to index and search text and/or metadata, and it has a highly configurable Ruby on Rails front-end.

Company/Organization: OCLC
Product: WorldCat Local
Product Web-Site: http://www.oclc.org/us/en/worldcatlocal/default.htm
Comments: Partnered with EBSCO so that “whether a search begins in OCLC’s WorldCat Local or EBSCOhost Integrated Search, users will have access to the resources of the entire library since catalog data will be available alongside journal information.” See press release.

Company/Organization: Oregon State University Libraries
Product: LibraryFind
Product Web-Site: http://libraryfind.org/home
Comments: Built with Ruby on Rails. See requirements.

Company/Organization: Serials Solutions
Product: Summon
Product Web-Site: http://www.serialssolutions.com/summon/
Comments: See Information Today article.

Demos of Discovery Services

If you know of other demos I’ll happily add them here.

References

  1. Beyond federated search?
  2. Beyond federated search? The conversation continues
  3. Beyond Federated Search – Winning the Battle and Losing the War?
  4. Extensible Catalog Project
  5. New Discovery Tools for Online Resources From OCLC and EBSCO
  6. Oregon State University Libraries WorldCat Local Task Force
    Report to LAMP
  7. ProQuest Proposes Pathway to New Platform
  8. SLA2009: Unified Discovery Services
  9. The difference between federated search and discovery services
  10. Top Technology Trends: July 2009
  11. Unified Discovery Services

Tags:

This entry was posted on Sunday, July 19th, 2009 at 8:06 pm and is filed under discovery service, technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

4 Responses so far to "Discovering discovery services"

  1. 1 Mads Villadsen
    July 20th, 2009 at 3:25 pm  

    After reading this post I felt the need to place the product you mention into two different categories.

    1) The ones where the unified search index is hosted at the local institution

    2) The ones where the unified search index is hosted at a single service provider and the web interface (possibly) runs at the local institution

    It may not seem like an important distinction but it does impact how much material can be made available.

    It seems to me that the publishers are more likely to make deals about “handing over” their metadata (and fulltext) to the big players.

    That is why the Summon sites search across 400 million records (much of it at the article level) and the Blacklight and LibraryFind sites only appear to search across a few million records.

    I think one of the major problems integrated search has to overcome in the next few years to be successful is how do the local institutions get hold of the metadata and fulltext needed to create their own unified search index.

    You are absolutely correct when you talk about the problems of source lock-in, and I agree with you when you say that federating discovery services can help overcome some of these issues. Both doing so will not give you all the advantages of a unified index, and for that reason I think it is important to keep working on improving the content of local integrated search solutions.

    When a library signs up for a subscription at a given database or publisher they should try hard to get an agreement where they get access to the metadata and the fulltext - this will give them the greatest flexibility when it comes to deciding what they want to search, how they want to rank it, how they want to present it, and more.

    I do realize that currently many publishers are resisting agreements like this since they probably see them as a threat to their core business. But hopefully over the coming years more and more libraries will want to be in control over how they present the data they have bought access to, and this will help pave the way for better agreements between publishers and libraries.

    And I would of course also like to provide you with a link to a discovery service - the Search system at the State and University Library, Denmark:

    http://www.statsbiblioteket.dk/search/

    I don’t really like calling it a demo since it has been our main search interface since 2006.

    It is a locally developed web interface on top of an integrated search system called Summa. More information about Summa can be found at these sites:

    http://www.statsbiblioteket.dk/summa/
    https://wiki.statsbiblioteket.dk/summa/

    In short Summa is an open source integrated search system based on Lucene.

    Disclaimer: I am the project lead of the Summa project.

  2. 2 Sebastian Hammer
    July 23rd, 2009 at 3:52 pm  

    I’ve seen the term Discovery Service used several times over the past couple of years, but this is the first time I have seen it used to denote something exclusively distinct from the kind of federated searching you discuss in this blog. I see it more frequently used in library environments to distinguish from the other, important piece of the puzzle: Delivery. We’re building solutions at the moment which are described by our customers as discovery services although they make use of a mix of locally indexed metasearching and broadcast searching — whatever best meets the needs… and at least one of the services you mention, WorldCat Local, is moving quickly towards adding a federated search capability (using some of our technology). If you’d asked me, I would have characterized a discovery service in library-land as a service that seeks to create a unified view of an organization’s information resources for the purpose of discovery. How it’s implemented under the hood is a different issue.

    I think we need to be a little careful with these ‘fuzzy’ terms that mean different things to different people.

  3. 3 Eric Tull
    July 30th, 2009 at 8:00 pm  

    I would distinguish two types of sources to be pre-indexed by a discovery service. First are the full-text aggregators and other vendors who are in the business of directing users to their full-text materials. To do so, they would probably not see a problem with providing a discovery service with their index terms so that these terms can be incorporated into the discovery service pre-index, as the user will eventually be directed to their full-text.

    However, database vendors who only provide indexing or abstracts, such as classics like Bio Abs, Sociological Abs, or PsychLit, are unlikely to want to provide this material to discovery services if it can then be provided to users whose libraries do not have subscriptions to these products.

    To get around this problem I think discovery services will need to keep their pre-indexing separate for each database, with duplicates between databases tagged somehow. The results would then be merged on receiving a search request, with users only receiving results from the databases that their library subscribes to. The discovery services would need to keep up-to-date on a library’s subscriptions, presumably through information received from the database producers.

    I would suspect that discovery service vendors will want to do this in any case, particularly as some are offering to include your library’s own databases - something they are likely to want to restrict to the one library rather than delivering them to everyone.

    Such a system would also permit subject-specific subsets of databases, so that users could specify that the search cover all the databases their library subscribes to, but only for a specific subject.

    I am not sure if developers of discovery services will discover this need on their own, but I think it is something libraries should push on them when these products are promoted to libraries.

  4. 4 Carolyn Smith
    March 15th, 2012 at 5:26 am  

    “Another reason for the big interest in discovery services is that the onerous task of building, monitoring, and repairing connectors disappears since there are no connectors”

    Can you tell us how the data is indexed. In my experience of an Enterprise Search Platform, which fits with your description of a discovery service, the connectors still have to be written in order to pull the metadata into the index in the right fields etc. Am I missing something, not being an expert in these matters?

Leave a reply

Name (*)
Mail (*)
URI
Comment