24
Oct

Michael Bergman: Federated search luminary (Part I)

Author: Sol

I’m excited to be publishing this interview with federated search luminary Michael Bergman. You can read my preview of this interview here. In this first installment Michael shares his early background.

Michael is the third person I honor in this luminary series. You can read prior interviews with luminaries Kate Noerr and Todd Miller in the blog archives.

1. I read with great interest your detailed bio. Your early career centered around the energy industry. Would you share with readers how you went from being an energy expert to co-founding a search company (BrightPlanet)?

My energy background started right after graduate school at Duke when I was hired as an assistant and then eventually headed up the US EPA’s Coal Technology Assessment program. That effort was to look at the 50-yr future of coal in the United States as an energy resource and environmental challenge. It was the first study that I know of that took an end-use perspective including conservation and net energy analysis. Though nearly 30 years ago, I’m still proud of that effort, including my co-authoring of the first report on global warming in the US government. I think we got much right. The sad thing is that our findings still pertain today and many were never really acted upon.

After a short stint as a research scientist at the Graduate School of Engineering at the University of Virginia, I joined the American Public Power Association in Washington, DC. I was there for nearly ten years running its research program. About two-thirds of our research efforts and funding went toward energy technologies, especially smaller, distributed ones like fuel cells or photovoltaics.

But, this was also the period (early 1980s) when the IBM PC first came out. Because our 2,200 municipal electric members were mostly small, this was the first real opportunity for many of them to get computerized. Because there was no software then suitable for our members, my research program got quite active in software development. We developed an 11-application suite called PowerManager, for example, and did a competitive bake-off followed by tailored development for a GIS system called PowerMapper. We did other cool software in energy conservation, consumer education and integrated planning as well.

I really got bitten by the bug running these software development efforts and then pretty much split my time for the next decade until the mid-1990s between energy-related software and the commercialization of small electric generating technologies.

2. How do you think that your pre-BrightPlanet experiences prepared you for this new direction?

Well, BrightPlanet was itself a spin-off of an earlier software company I had formed in 1994 called VisualMetrics. VisualMetrics was a data warehousing company focusing on structured data. (It later did bioinformatics and genome data indexing and management with support from NIH and NSF as well.)

So, I think there were really two key experiences that prepared me as a software entrepreneur, both with threads from APPA and my management experience at EPA. I learned much about software development and management from the PowerManager and related packages we produced. Then, in the five-year hiatus between APPA and VisualMetrics, I worked with three energy technology commercialization efforts in fuel cells, photovoltaics and biomass.

Each of those efforts built around a commercialization model that I was acknowledged as being the ”father of”. The idea was to use electric utility need and expertise to guide commercial development, matched with federal dollars to help “buy down” high, early commercialization, costs combined with competitively selected vendors willing to listen to market needs. Thus, in a way, our commercialization groups acted similarly to venture capitalists, but with a more hands-on approach and market involvement.

So, I cut my teeth on managing software projects in one set of efforts, and came to realize I was an entrepreneur at heart in the other set of efforts.

3. What inspired you to dive into the Deep Web?

I think the story has some interesting aspects of the early Web and shows the serendipity of some ‘Aha!’ moments.

In 1996 I was on a business meeting in NYC and passing through the Holland Tunnel with some colleagues when we had an ‘Aha!’ regarding the confluence of VisualMetrics’ data warehousing strengths with the fact the Internet was emerging as a global “data warehouse.” Upon returning home, I got our development team together and we decided to mount an internal research effort to “mine” the Web. From VisualMetrics’ perspective this had a number of potential payoffs: learning intimately about the new phenomenon of the Web; learning to index and process unstructured text data, which was a gap in our technology portfolio; and learning about HTTP and access and harvesting protocols.

This effort was lower priority to our commercial work so it was not until late 1997 that we had a working prototype. Reaction from clients and some universities was really positive, so we decided to make it commercial. We introduced the Mata Hari desktop “metasearch” tool in early 1998. The first version, I recall, accessed about 90 search engines.

Mata Hari eventually won a ton of awards and got more capable and sophisticated, but only got to #3 in sales at that time. Two other prominent metasearchers were Copernic and Bullseye from Intelliseek (acquired many years ago.) Metasearchers were important because early search engines only had limited indexing and reach. For example, I often quote that the first search engine that went public, Lycos, did so with only 54,000 sites indexed in 1994. Serious researchers needed to metasearch multiple search engines. Also, AltaVista was arguably the search engine of choice for researchers at that time because of its great query options and precise results counts, but it too only indexed something like 30% of the Web.

A really influential paper in Science magazine from Steve Lawrence and Lee Giles came out in 1998 that quantified this lack of coverage. They presented an “overlap analysis” methodology to estimate the complete size of the indexed Web at that time (it was 320 million documents) and that no search engine indexed more than 58% of it. So, all of us serious about “getting it all” were focused on efficient ways to get coverage to 100%. (BTW, another issue was delays in new content getting indexed. Today, of course, Google seems to index almost instantly and no one complains about too little coverage.)

Meanwhile, in the period 1998-2000 we were increasing the number of sources accessed by Mata Hari into the hundreds. This was causing us real pain in adding and maintaining these search engines manually. A big effort for us was to develop automated and intelligent ways to add new sources and keep them current.

But the other real ‘Aha!’ for me was the simple realization that the sources we were tapping into were now no longer search engines, as understood in the sense of an Excite or AltaVista or Hotbot or (now) Google as a searchable look-up index to other Web documents, but searchable databases of unique content with a Web search form at the front end. By calling them search engines and thinking of them as search engines we were missing some fundamental aspects of searchable databases that were then beginning to explode in use on the Web.

We laughed about it at the time and shook our heads. The analogy we used was that the deep Web was everywhere and right in front of us, but so obvious everyone was missing it. It was like putting your forefinger right between and before your eyes: it was so close you never saw it!

Now, of course, the fact that many of these sites we had been metasearching were specialty databases and not search engines per se sounds so obvious and Duh! as to be ludicrous. But, truly, at that time, no one really seemed to have connected those particular dots.

Once this slightly different — but crucially important — serendipitous perspective toward dynamic, searchable databases happened, the floodgates opened to whole new sets of understandings. We rapidly got into the questions of database-backed Web servers, text indexing and query engines, characterizing and classifying the sources themselves, and means to “talk” to the variations of these sites.

Also, of course, we set out to quantify just how big this deep Web thing might be. Going into the analysis, we suspected it was of equal size if not a bit larger than the existing “surface” Web. Our results, to put it mildly, totally blew us away, as it apparently did many others.

4. What inspired you to start BrightPlanet in 2000? Is there an interesting story about how you named the company? Also, how did you end up in South Dakota?

Actually, BrightPlanet was founded in April 1999 in partnership with Paulsen Marketing Communications.

Prior to this point VisualMetrics was starting to get torn into two quite separate directions: our commercial data warehousing and bioinformatics efforts versus the Web stuff with Mata Hari and what we were learning (but not yet discussing) about the deep Web. Frankly, I was also doing a poor job handling both and was seeking to add marketing and basic business development skills. I was very seriously considering dropping our Web work altogether to regain focus.

BrightPlanet grew out of this concern and what we came to call “The Tiahrt Connection.” One of my senior developers, Tom Tiahrt, who was also the key force behind Mata Hari’s text indexing engine, understood these issues. His sister, Sara Tiahrt Steever, was also the interactive media director for Paulsen Marketing Communications in Sioux Falls, one of the largest and only AAAA advertising agency in the state. Tom and Sara conspired to get their bosses together, and the eventual result was a partnership to spin off BrightPlanet as its own standalone entity. Thane Paulsen continues to have a key role with BrightPlanet to this day.

Our initial naming ideas for the company wanted to emphasize the entire Earth, the scope of the Internet. We tried first around the Third Rock idea (based on the TV comedy of that time) and other ways to describe the place of the Earth in the solar system and universe. As everyone knows, and it has gotten worse, finding a name that can be trademarked with a matching Web domain name has become devilishly difficult.

The combo of “bright” for intelligent and luminous with “planet” in the solar system proved to be a winning combo. (We possibly would have tried to grab ‘deep Web’ as a more direct hook, but all of this naming was prior to those realizations.) This solar system theme continues to this day with the BrightPlanet logo and marketing collateral.

For a time we also were looking at vertical search strategies, and so had reserved a whole bunch of “planet” domain names such as ReferencePlanet, SciencePlanet, LegalPlanet, AgPlanet, MedicinePlanet, and so forth. Only CompletePlanet was ever used as an actual Web presence.

The original South Dakota connection arose from my wife first being recruited to the state as a university professor, and then my initially starting VisualMetrics there.

The interview continues next week.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: deep web, federated search, michael bergman

This entry was posted on Friday, October 24th, 2008 at 10:27 am and is filed under luminary. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or TrackBack URI from your own site.

One Response to "Michael Bergman: Federated search luminary (Part I)"

1 Multi-part Federated Search Interview » AI3:::Adaptive Information
November 14th, 2008 at 3:39 pm
[...] Part I (Oct. 24) [...]

Michael Bergman: Federated search luminary (Part I)

One Response to "Michael Bergman: Federated search luminary (Part I)"

Leave a reply

Categories

Archives

Pages

Sponsored By

Subscribe via RSS

Subscribe via Email

Proud Member

Recent Posts

Recent Comments

Web essentials