Abe has been giving me a hard time for wanting to review The Invisible Web. Chris Sherman and Gary Price wrote the book in 2001. That’s seven years ago. That’s a lifetime or three in Internet years. The book’s an antique. Its information is out of date. Why waste my time with it?
I’m moving forward with my review because I believe that the history of federated search and of the deep web is important to study for the same reasons that we study our country’s history in school. We hopefully learn from the past and gain an appreciation and a perspective for how far we’ve come over a period of time. And, the invisible web is the deep web and the deep web is the reason for the existence of the federated search industry.
My first impression of the Invisible Web is, “Gee, this book is quite dated.” Many of the terms used in the book are not in common use today. (Check out the deep web trivia contest I posted not too long ago for a sampling of some of these terms.) And, of the 27 chapters in the book, the last 21 are incredibly dated. Many of those later chapters provide directories of websites in topics such as art and architecture, bibliographies and library catalogs, business and investing, health and medical information, U.S. and world history, and more.
The resource chapters are interesting to get a window on what was important to Internet users just seven years ago. It’s quite fun to see, for example, what the authors considered the top computers and computing resources to be in 2001. Sources include ACM Digital Library, Bitpipe, Computer Science Research Paper Search Engine, McAfee World Virus Map, Network World Fusion, ResearchIndex, and The Collection of Computer Science Bibliographies. How many of these services have you heard of ? How many still exist?
Beyond pure historical entertainment, the first six chapters are educational, especially for people new to the deep web. I describe the first four chapters.
Chapter 1 tells a nice history of the Internet and of the Web including some things I didn’t know before. As a little bit of an aside, I worked at SRI International during the early ’80s. SRI had a contract for a number of years to provide administrative and technical support to users of the government’s ARPANET and MILNET networks – the precursors to today’s public Internet. I worked specifically at the Network Information Center (SRINIC) providing support to these government users. I was active in the industry when Internet history was being made. And, I learned a lot from the history chapter.
How many of you have heard of a paper about a “galactic network” of computers? How many of you know that this paper inspired the creation of the ARPANET?
The discussion about Tim Berners-Lee and his contributions to today’s Web is fascinating. Do you know how Berners-Lee came up with the term “World Wide Web?” I didn’t.
Chapter 2 discusses web directories and search engines. While some of the limitations discussed still exist, they are much smaller than when the book was published. Yet, the discussions are still educational today.
Chapter 3 covers specialized and hybrid search tools. Aside from the discussion of Metasearch engines (applications that search the popular search engines) and vertical portals, the topics in this chapter are not as well known but are still relevant today. We don’t hear much about focused crawlers yet companies such as Deep Web Technologies still utilize them to harvest focused content in some of its applications.
Chapter 4 introduces the invisible web (aka the deep web), and such interesting terms as the opaque web, the private web, the proprietary web, and the truly invisible web.
As you can probably tell, I enjoyed this “antique” book, but then I read hundred year old math books for fun. If you have an itch to connect to “ancient” history of the deep web, you’ll probably also enjoy this book.