Michael Bergman is federated search luminary number three. Like Kate Noerr and Todd Miller, Mr. Bergman has made major contributions to federated search. You may want to read my recent article, On the history of the deep web, to gain an appreciation of the depth of his experience.
I interviewed Michael about a number of areas in which he’s done important work:
- Co-founding of BrightPlanet
- Coining of the term “Deep Web”
- The most well-known white paper to quantify the size of the Deep Web
- The Semantic Web
The interview included 23 multiple part questions. Michael was generous enough with his time to provide in-depth answers. I gained an education and expect that you will too.
Here are the questions:
- I read with great interest your detailed bio. Your early career centered around the energy industry. Would you share with readers how you went from being an energy expert to co-founding a search company (BrightPlanet)?
- How do you think that your pre-BrightPlanet experiences prepared you for this new direction?
- What inspired you to dive into the Deep Web?
- What inspired you to start BrightPlanet in 2000? Is there an interesting story about how you named the company? How did you end up in South Dakota?
- What were the early days of BrightPlanet like? What were the challenges and successes?
- A recent timeline of the Deep Web makes this statement: “Shestakov (2008) cites Bergman (2001) as the source for the claim that the term deep Web was coined in 2000.” Do you agree with the statement? Did you coin the term “Deep Web?” If so, was there some other term you were considering instead? If not, who did coin the term?
- Your BrightPlanet white paper, The Deep Web: Surfacing Hidden Value, has become a classic source of quantitative information about the Deep Web. As you know, your white paper has been quoted all over the Internet. Looking back, is there anything you would change in the methodology you used to estimate the size of the Deep Web?
- Your white paper concluded that, among other things, Deep Web documents were on, average, three times better, quality-wise, than Surface Web documents. Can you elaborate on the linguistic techniques you used to compute document quality, and do you have a sense of what that figure might be today?
- For the benefit of readers, would you explain the distinction you make between the “Invisible Web” and the “Deep Web?”
- In 2001, the same year that you published your white paper, Gary Price, Chris Sherman, and Danny Sullivan wrote a book: “The Invisible Web: Uncovering Information Sources Search Engines Can’t See.” Was there some overlap between their work and yours?
- What approaches did BrightPlanet utilize over time to make content searchable?
- Did you consider real-time federated search as a content access approach for BrightPlanet? Why or why not?
- CompletePlanet boasts “over 70,000 searchable databases and specialty search engines.” In February of 2004, according to the Wayback Machine, that number was as high as 103,000. How did you build this catalog? How did you select the sites? Did you have tools to identify search engines?
- Would you explain how Deep Query Manager is able to search 70,000 databases at once?
- Here’s a hard question: How big do you think the Deep Web is now and how much larger is it than the Surface Web?
- Who do you think has the best quantitative data about the size of the Deep Web today?
- What do you see as the pros and cons of real-time federated search vs. crawling, indexing and harvesting?
- What do you see as major challenges with real-time federated search?
- What do you think the landscape of federated search will look like in ten years? This blog is sponsoring a writing contest to predict the future of federated search. Perhaps you’d like to enter?
- Tim Berners-Lee, credited with inventing the World Wide Web, has been talking about the importance and value of the Semantic Web for years yet common folks don’t see much evidence of the Semantic Web gaining traction. Is there substance to the Semantic Web? What’s happening with it now and what does its future look like?
- Can you speak to the intersection of federated search and the Semantic Web? Is anyone aggregating semantically tagged content using a federated search approach?
- Can you explain what you’re doing at Zitgist? What problem is Zitgist addressing? What are its products and services? Who are its customers?
- Is there a next big project or venture?
I will break the interview up into several parts. I’ll publish the first installment next week.