27
Dec

Content access basics - Part I - screen scraping

Author: Sol

In this multi-part series we will look at a number of different approaches that federated search engines (FSEs) take to access content from remote databases.

FSEs are always at the mercy of the content provider when it comes to searching and retrieving content. FSEs perform deep web searches since they access content that lives inside of databases. Read the earlier articles on crawling vs. deep web searching and introduction to the deep web for background information on deep web searching. Also, read the article about connectors to understand how the query processing and search engine submission process works for deep web searching.

When FSEs search deep web databases they often do so by filling out search forms much like humans do and they also process result lists (summaries of documents generated by the remote search engines) much like the way humans examine the search results in their browsers. Processing a list of search results by reading and dissecting the HTML that a search engine provides is called “screen scraping.” Wikipedia has an article about screen scraping.

Read the rest of this entry »

Fled under » basics | 4 Comments »

Content access basics - Part I - screen scraping

Categories

Archives

Pages

Sponsored By

Subscribe via RSS

Subscribe via Email

Proud Member

Recent Posts

Recent Comments