4
Jan

Content access basics - Part III - OpenSearch

Author: Sol

This is the third part in a series of articles that explore how federated search engines (FSEs), especially those that search the deep web, process search results from search engines. Part I looked at screen scraping of search result data from search engines that only provide HTML intended for human consumption. Part II looked at the more pleasant situation of processing XML that a growing number of search engines are returning. This article looks at the emerging OpenSearch standard and how FSEs can benefit from it.

Wikipedia summarizes OpenSearch pretty well:

OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. OpenSearch was developed by Amazon.com subsidiary A9 and the first version, OpenSearch 1.0, was unveiled by Jeff Bezos at the Web 2.0 in March, 2005. Draft versions of OpenSearch 1.1 were released during September and December 2005. The OpenSearch specification is licensed by A9 under the Creative Commons Attribution-ShareAlike 2.5 License.

The “format suitable for syndication and aggregation” mentioned above refers to two standards, RSS 2.0, and Atom 1.0, both of which present their data in XML.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

Fled under » basics | No Comments »

30
Dec

Content access basics - Part II - XML

Author: Sol

Part I of this series on content access basics explained how screen scraping is used by many federated search engines (FSEs) performing deep web searches to process search results plus the problems associated with this approach. This article provides an introduction to how XML-formatted search results are processed by FSEs.

FSEs use jargon such as “XML gateway” or “XML interface” to refer to the fact that they have a way of interacting with a particular content source using XML. It may be that the FSE generates XML and submits an XML query or that search results are generated by the remote search engine and returned as an XML document. In this article we are going to focus on the processing of XML results.

So, what is XML? Wikipedia has a nice introduction to XML plus a few examples. Here’s a nice simple tutorial on XML. The important idea about XML is that there is no ambiguity about where to find information. XML is intended for consumption by computer programs. It is very highly structured.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

Fled under » basics | 1 Comment »

Sponsored By
Subscribe via RSS

Subscribe to posts
or to comments
Subscribe via Email
- Subscribe to Posts
- Subscribe to Comments
We're on twitter
- (Federated Search Blog) Free admission to May Enterprise Search Summit keynotes and showcase: http://bit.ly/jzQ9g 07:08:25 PM May 01, 2009 from bit.ly
- (Bibliographic Wilderness) Anyone know what SerSol is doing with Rails? http://bit.ly/12y9xf 10:57:55 AM May 01, 2009 from bit.ly
- (National Library of Australia) Federated discovery opportunities for Australia’s collecting institutions: http://bit.ly/Ogzkq 10:53:35 AM May 01, 2009 from bit.ly
- (Social Science Research Network) Learning new applications: http://bit.ly/88Yv0 10:50:54 AM May 01, 2009 from bit.ly
- (new idea engineering) What's in a name: Federated Search. A couple of years old but really good: http://bit.ly/lRlxc 10:47:53 AM May 01, 2009 from bit.ly
Proud Member
Recent Posts
Recent Comments

Content access basics - Part III - OpenSearch

Content access basics - Part II - XML

Categories

Archives

Pages

Sponsored By

Subscribe via RSS

Subscribe via Email

We're on twitter

Proud Member

Recent Posts

Recent Comments

Web essentials