Part I of this series on content access basics explained how screen scraping is used by many federated search engines (FSEs) performing deep web searches to process search results plus the problems associated with this approach. This article provides an introduction to how XML-formatted search results are processed by FSEs.

FSEs use jargon such as “XML gateway” or “XML interface” to refer to the fact that they have a way of interacting with a particular content source using XML. It may be that the FSE generates XML and submits an XML query or that search results are generated by the remote search engine and returned as an XML document. In this article we are going to focus on the processing of XML results.

So, what is XML? Wikipedia has a nice introduction to XML plus a few examples. Here’s a nice simple tutorial on XML. The important idea about XML is that there is no ambiguity about where to find information. XML is intended for consumption by computer programs. It is very highly structured.

