9
Jul

New open source federated search middleware released

Author: Sol

Sesam.no has released Sesat (Sesam Search Application Toolkit) as open source software. Sesat is middleware - it sits between data sources and the search portal that users interact with. An announcement of Sesat’s release is provided here.

I had a discussion, via email, with Mick Wever about Sesat; he’s one of the active maintainers of the Sesat platform. Following are some excerpts of our dialogue.

In a nutshell, how would you describe Sesat?

Sesat is a programmer’s library/toolkit to simplify implementing a search application. It can be a search website, a j2ee portal, a webservice, or a swing (or .net) application (currently these are not possible out-of-the-box).

Mick, I can see that we’re going to be getting somewhat technical here plus much of the content at the sesat.no web-site is quite technical as well. A number of my blog readers are not going to be implementing your middleware directly, but might want to understand your technology so that they can discuss it with their IT departments or consider it among other offerings. Can you provide a non-technical overview?

Here is a page that you might find useful and less technical. And, some readers might find the Product Whitepaper helpful.

Where would you say Sesat shines most?

It is best suited when searching must be carried out in multiple backends, new and/or legacy, local and/or remote. Sesat itself does not crawl or index data, it takes responsibility instead for abstraction, federation, threading+throttling, and presentation of the searches directed towards the backends.

Sesat is best suited for teams looking to a) focus on building/prototyping the design quickly skipping over the federating of results, and/or b) focus on tuning the crawling and indexing for better performance and relevancy of backends.

I understand that Sesat makes it easy to build a number of related search portals with many different styles. Can you explain that?

Yes, Sesat is especially useful for search applications that wish to have many different styles or presentations, as the presentation is written in a plugin (or “skin”) manner where plugins can be hierarchical. For example there are many site searches we host that are child plugins to our main sesam.no and sesam.se skins. Providing these site searches is a relatively easy (an hour of work) task for us as they adopt the bulk of the implementation and configuration from either sesam.no or sesam.se.

How is Sesat particularly suited to federated search?

In many ways you can think of Sesat as a niche framework, like Struts, WebWork, Seam, Turbine, etc, but for search applications, especially federating search applications. Technically the biggest fundamental reason not to use other frameworks for such an application is that searching is a data-fetching paradigm while the mainstream frameworks are more post-and-fetch paradigm suited for administration and wizard (step-1-2-3) applications.

Who are your competitors?

Really the only “competitor” is FAST Unity. We are not allowed to describe what FAST can and can’t do due to their very restrictive license, but I can say that they implement little of the feature matrix written up here.

How would someone build an application with Sesat?

The steps to building a solution are illustrated in the tutorial.

Let’s say I want to build a federated search application. Let’s say I have a list of a half dozen sources I want to federate. How do you do the federation?

It depends on where you want the federation to happen. Some backends can do it for you. Sesat can federate results via a RunHandler (the most advanced approach). The presentation layer can also manually federate results.

In the latter two options Sesat often uses a “Query Matching” process against the user’s query. This process evaluates metadata against each word (and word combinations). The presence of this metadata can determine which searches to run, and what extra options to use to get best results (for example it might already be known what sources will get federated.)

Of course, federation at the Sesat level is restricted by the number of results you obtain from the backend, although hints can be provided as mentioned above. And, there’s nothing stopping you from requesting thousands of results from each source. But the real advantage of Sesat is that sources are completely abstracted into a generic result list so the federation need not have any idea about what implementation the source is, and that quite different sources (remote and local) can be federated. At sesam.no, different backend sources have changed a number of times, each change inflicted no code or presentation changes, only configuration against the source was changed.

So, where does searching of sources happen? That’s the hard part of federated search - the connector development.

Yes this is the key, and core, of Sesat. The interface is called SearchCommand, and the default implementation is AbstractSearchCommand.

To get a better overview of the process cycle take a peek at this illustration.

The search command I refer to is labeled the Command Execution. It is pipelined (like much of the platform) with transformers (pre-processing) and handlers (post-processing).

Also, Sesat makes it very easy to implement and/or configure with default implementations for these protocols: FAST, Overture PPC , PicSearch, Platefood PPC , Sensis , Yahoo IDP , Yahoo Contextual Web Service, Yahoo Media, Blinkx Video, Blocket, Finn, GeoData map, HittaMap , Hitta , HittaWeather , and Tasteline. There’s no doubt more implementations now.

Can you point me to web-sites of customers who have built federated search systems from your components?

Sesam.no and sesam.se are actively developing and using Sesat.
Sesat has been open sourced this year and there’s been a number of different parties ask questions, publicly or privately, about getting started and various tasks.

But both sesam.no and sesam.se implement site search websites for private customers. Take a peek at these different sesam websites. All these are just Sesat skins running in the one jvm.

   http://sesam.no/bil
   http://sesam.se/bil
   http://sesam.com/bil
   http://kart.sesam.no/search/?c=map&q=bil
   http://vg.sesam.no/tv
   http://vg.sesam.no/bil
   http://aftenposten.sesam.no/bil
   http://partner.sesam.no/nrk
   http://nettby.sesam.no/?
   http://ab.sesam.se/stockholm
   http://sesam.no/nyheter/bil
   http://sesam.no/bilde/bil
   http://sesam.no/person/bil
   http://sesam.no/katalog/bil

Mick, thank you. I think readers will find this information intriguing.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: federated search, sesam, sesat

This entry was posted on Wednesday, July 9th, 2008 at 11:27 am and is filed under industry news. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

One Response to "New open source federated search middleware released"

1 Magne Thyrhaug. » SESAT in the news!
July 10th, 2008 at 2:47 pm
[...] Federatedsearchblog.com have posted an interview with SESAT lead developer Mick Semb Wever. SESAT stands for Sesam Search Application Toolkit and is a search middleware tool written in Java. It has been developed inhouse at Sesam, but was open sourced earlier this year. [...]

New open source federated search middleware released

One Response to "New open source federated search middleware released"

Leave a reply

Categories

Archives

Pages

Sponsored By

Subscribe via RSS

Subscribe via Email

Proud Member

Recent Posts

Recent Comments

Web essentials