Apr
In, what industry analysts consider a bold move, Google has decided to stop crawling the web. Google management claims that the rapidly rising costs of purchasing and maintaining tens of thousands of index servers is what’s pushing it to rethink its approach to dominating Internet searching. Industry experts claim there are other pressing concerns driving Google to abandon its crawl and index approach.
The Google index is growing out of control. It’s not scaling. Google is having a hard time even counting how many documents it has in its index. Industry experts believe that, without a serious “prune job”, Google will not be able to add significantly more documents to its index and will run out of “index counting space” sometime this year. It’s kind of like a Y2K problem, analysts explain. To have 10 billion or more documents in its index, Google would need more than 10 digits in the number that is used to store the size of its index. But, the Google indexing code was written many years ago, at a time when no one could have imagined that the index could grow beyond 10 digits. Recompiling the code and changing the constant that counts documents is not an option, explain Google senior programmers. That code is so old that few in the industry believe it could even be recompiled again. Fortunately, index problem aside, Google’s code hasn’t needed to be recompiled in the last ten years as Google programmers got it mostly right the first time.
Beyond difficulties with counting documents, industry pundits believe that Google is afraid of falling behind Yahoo! and Microsoft. Microsoft entered the federated search market late last year, giving away enterprise search software that includes federated search. And, Yahoo! has dabbled in federated search with Yahoo! Subscription. Google doesn’t have a good story to tell about how it’s taking over the world with federated search and that leaves Google, and its high stock price, exposed.
Most importantly, perhaps, Google is realizing that most high quality documents aren’t to be found by crawling. They live in databases, where it takes federated search applications to find them. Industry analysts believe that Google is slowly, and painfully, realizing that it’s been dishing out the wrong kinds of documents to the public for a whole lot of years. “While Google might be great for finding out when Survivor is going to be on and who’s left, it won’t help people to cure themselves of brain cancer”, explains industry guru and Deep Web Technologies founder Abe Lederman.
How will Google make the switch to federated search? No one knows for certain as Google is being very tight-lipped about the transition. A recent Ebay auction listing for “web index, 9.9+ billion documents, includes images, and audio” didn’t get any bids so Google will need to find other ways to divest itself of its dated approach to dominating the web search industry. It’s likely that Google will make a play for any and every company that does federated search.
“The future is bright for federated search”, explains Abe Lederman. “I’m sure many of us have always wanted to work for Google and we just couldn’t pass their stupid tests. Now we won’t have to pass their tests.”
Happy April Fools’ Day, everyone!
Tags: federated search





4 Responses so far to "Google to stop crawling the web: will federate it instead"
April 10th, 2008 at 7:38 pm
Hope this is just making april fool. Not truth. I am ready to be april fooled rather than seeing this happened in reality.
April 10th, 2008 at 8:02 pm
Gita - Yes, it’s an April Fools’ joke.
April 28th, 2008 at 5:28 am
Hi Sol and thanks for the story. No, I’m not slow to get the joke (a month later) - I just got here. Honest. 
 
I didn’t know you were so heavily into search. Thanks, I learned the technical term for deep Web searching by coming here.
April 29th, 2008 at 8:30 am
Zac,
Welcome. Yup, I’m heavily into search.
Sol