Archive for the "technology" Category


[Editor's Note: I received this email from Azhar Jassal at I like what he's up to so I thought I'd give him a plug by republishing his letter, with Azhar's permission.]


I wanted to make you aware of a new search engine that I have spent the last 15 months building:

This is a new breed of search engine, it is a “structured search” engine. This type of search engine queries both the document web and the semantic web harmoniously. I have developed a simple query language that allows a user to intertwine between both of these worlds.

The purpose of is to complete a users overall information retrieval task in as short time as possible by providing the most informative entity centric result. This is accomplished by either accepting an unstructured query (just how mainstream search engines are used) and applying conceptual awareness or by making structured queries, something all current mainstream search engines are incapable of doing (as they only concern themselves with the document web/ not the semantic web), which in my opinion adds a whole new dimension to information retrieval systems.

Read the rest of this entry »


[ This article was originally published in the Deep Web Technologies Blog. ]

The highly regarded Charleston Advisor, known for its “Critical reviews of Web products for Information Professionals,” has given Deep Web Technologies 4 3/8 of 5 possible stars for its Explorit federated search product. The individual scores forming the composite were:

  • Content: 4 1/2 stars
  • User Interface/Searchability: 4 1/2 stars
  • Pricing: 4 1/2 stars
  • Contract Options: 4 stars

The scores were assigned by two reviewers who played a key role in bringing Explorit to Stanford University:

  • Grace Baysinger, Head Librarian and Bibliographer at the Swain Chemistry and Chemical Engineering Library at Stanford University
  • Tom Cramer, Chief Technology Strategist at Stanford University Libraries and Academic Information Resources

Read the rest of this entry »


The Harvard Library Innovation Laboratory at the Harvard Law School posted a link to a 23-minute podcast interview with Sebastian Hammer. Hammer is the president of Index Data, a company in the information retrieval space, including federated search.

Update 4/3/12: A transcript of the interview is here.

Hammer was interviewed about the challenges of federated search, which he addressed in a very balanced way. The gist of Hammer’s message is that, yes, there are challenges to the technology but they’re not insurmountable. And, without using the word “discovery service,” Hammer did a fine job of explaining that large indexes are an important component of a search solution but they’re not the entire solution, especially in organizations that have highly specialized sources they need access to.

I was delighted to hear Hammer mention the idea of “super nodes” to allow federated search to scale to thousands of sources. Blog sponsor Deep Web Technologies has used this idea, which they call hierarchical federated search for several years. Several of their applications search other applications which can, in turn, search other applications. In 2009, Deep Web Technologies founder and president Abe Lederman delivered a talk and presented a paper at SLA,
Science Research: Journey to Ten Thousand Source, detailing his company’s proven “divide-and-conquer” approach to federating federations of sources.

I was also happy to hear Hammer speak to the importance of hybrid solutions. Federation is appropriate for gaining access to some content and maintaining a local index works for other content. Neither alone is a complete solution. Deep Web Technologies figured this out some years ago. A good example of hybrid search technology is the E-print Network, a product of the U.S. Department of Energy’s Office of Scientific and Technical Information, (OSTI). Deep Web Technologies built the search technology, which combines information about millions of documents crawled from over 30,000 sites, with federated content. I have been involved with the crawl piece of the E-print Network for a number of years and can testify to the power of the right hybrid solution. In 2008 I wrote a three-part series of articles at OSTI’s blog explaining the technology behind the E-print Network. Part One is here.

In conclusion, I highly recommend the podcast for a good reminder that federated search isn’t dead and that it’s an important part of search.


Multilingual federated search, the ability to search and to view results from foreign language sources in your own language, may be just an interesting idea to some but there is a strategic value to the technology. Consider this article published by the BBC in March of 2011: China ‘to overtake US on science’ in two years. If the prediction of the UK’s national science academy, the Royal Society, proves true then sometime next year China will produce scientific research papers at a faster rate than the current leader, the U.S.

Researchers in the English-speaking world have mostly been restricted to searching only English language sources since the tools for simultaneously searching foreign language sources and for performing the translations haven’t existed until recently. Thus, opportunities to search scholarly journals in Chinese, Japanese, Portuguese and other languages associated with countries producing a great volume of science output are being missed. In an economic climate where performing research and getting products to market quickly translates to that competitive edge that leads to greater profits, being able to scour the research Web quickly, effectively, efficiently, and on an ongoing basis is critical to developing and maintaining a competitive edge.

Blog sponsor Deep Web Technologies has developed a patent pending multilingual search version of its Explorit federated search application that integrates the search and translation technologies making for a seamless and productive research environment for scientists, engineers, and researchers in business, science, and technology.

Read the rest of this entry »


Last Friday blog sponsor Deep Web Technologies released its beta version of multilingual federated search, available at Deep Web Technologies and several government agencies key to the effort acknowledged the great accomplishment via press releases.

Deep Web Technologies

HELSINKI, June 11 /PRNewswire/ — Deep Web Technologies unveiled multilingual translation capability today for the WorldWideScience Alliance using its federated search application., the international science portal, is the first application to be deployed with this unique capability. Abe Lederman, President and CTO of Deep Web Technologies, demonstrated the new technology at the International Council for Scientific and Technical Information’s (ICSTI) 2010 Summer Conference in Helsinki. ICSTI is a primary sponsor of the Alliance, whose purpose is to provide “a geographically diverse, governance structure to promote and build upon the original vision of a global science gateway.”

Multilingual federated search translates a user’s search query into the native languages of the collections being searched, aggregates and ranks these results according to relevance, and translates result titles and snippets back to the user’s original language. The translation, powered by Microsoft, makes it simple to search collections in multiple languages from a single search box in the user’s native language. The Conference will include a keynote address by Tony Hey, Corporate Vice President of the External Research Division of Microsoft Research, as well as a presentation by Dr. Walter Warnick, Director of the Office of Scientific and Technical Information of the U.S. Department of Energy Office of Science. (More)

US Department of Energy Office of Science

Washington, D.C.—Scientific language barriers were broken today in Helsinki with the launch of Multilingual While a large share of scientific literature is published in English, vast quantities of high-quality science are not, and the pace of non-English scientific publishing is increasing. will now enable the first-ever real-time searching and translation across globally-dispersed, multilingual scientific literature using complex translations technology.

“In an increasingly interconnected world, resolving the global challenges of science requires rapid communication of scientific knowledge,” said Dr. William F. Brinkman, Director of the Office of Science, U.S. Department of Energy. “Breaking the language barrier through will help erode borders and build research networks across DOE, the nation, and around the globe.” (More)

DOE Office of Scientific and Technical Information

OAK RIDGE, TN - Now you can find non-English scientific literature from databases in China, Russia, France, and several Latin American countries and have your search results translated into one of nine languages. With the beta launch today (view the Office of Science announcement) of Multilingual, real-time searching and translation of globally-dispersed collections of scientific literature is possible. This new capability is the result of an international public-private partnership between the Alliance and Microsoft Research, whose translation technology has been paired with the federated searching technology of Deep Web Technologies.

Microsoft Research Corporate Vice-President Tony Hey said, “We are extremely pleased to have our Microsoft Translator technology used with WorldWideScience. Built at Microsoft Research, this translation technology already provides translations to millions of users. Partnering with WorldWideScience is an opportunity to advance science across language barriers and improve scientific discovery.” (More)

British Library

World Wide Science Alliance broadens access to global research with the launch of a new multilingual tool, enabling scientists to simultaneously search and translate over 400 million pages of scientific research published in 65 countries from around the world.

Although most scientific literature continues to be published in English, the pace of non-English scientific publishing is increasing rapidly, with vast quantities of high-quality science now being produced every year. Launched today at the International Council for Scientific and Technical Information (ICSTI) annual conference in Helsinki, Finland, a new beta version of will enable scientists to break down the language barrier, facilitating greater global cooperation with regards to the pursuit of scientific research. (More)


Earlier this month ReadWriteWeb reported on a mechanism Google is creating for real-time indexing:

Google is developing a system that will enable web publishers of any size to automatically submit new content to Google for indexing within seconds of that content being published. Search industry analyst Danny Sullivan told us today that this could be “the next chapter” for Google.

And, here’s an interesting comment.

Last Fall we were told by Google’s Brett Slatkin, lead developer on the PubSubHubbub (PuSH) real time syndication protocol, that he hoped Google would some day use PuSH for indexing the web instead of the crawling of links that has been the way search engines have indexed the web for years.

If PuSH is as widely used as Google hopes it will be then this is a major paradigm shift for the search giant. No, Google won’t stop crawling the Web but if a critical mass of Web publishers get Google (and presumably other search engines) to index their content very quickly then the real-time Web will take a giant leap forward.

It will be interesting to see how PuSH impacts the federated search community. Clearly the real-time Web can move scientific information very quickly. Perhaps this new technology and paradigm will augment nicely the flow of scientific papers found by federated search applications in the deep Web.


Multilingual federated search is a big deal for a couple of reasons. First, no one has done it up to now. Yes, Google just added translation into its universal search. And, no doubt Bing will follow suit. But, being able to search the quality scientific and technical information that sometimes is only available via federated search, and doing it in foreign languages, is important.

The second reason that multilingual federated search is so important is because China, Japan, Russia, and other nations produce large volumes of research output. As the world shrinks we can’t afford to ignore the non-English literature. In a blog article the author noted that Thomson Reuters highlighted the importance of China’s research output on the basis of sheer volume :

According to citation analysis based on data from Web of Science, China is ranked second in the world by number of scientific papers published in 2007. Scientific’s World IP Today Report on Global Patent Activity 2007 reported that China almost doubled its volume of patents from 2003 to 2007, and looks set to become a strong rival to Japan and the United States in years to come.

The bottom line: federated search is about research and research is global.

Read the rest of this entry »


Discovery services have begun to spring up. This article is my attempt to catalog and characterize them. Consider this article to be an introduction that sets the stage for future analysis articles.

What is a discovery service?

A discovery service is a search interface to pre-indexed meta data and/or full text documents. Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration. I believe that hybrid “federated discovery” services are likely to prevail over pure discovery services and I will dedicate an article to them.

Read the rest of this entry »


What is an API?

Author: Sol

Product manager for and lead for their blogger training, Andy Wibbels, wrote an outstanding blog article, “What is an API?” As a programmer I know what an API is but I have a hard time explaining the concept to non-programmers. Now, Andy has done the explaining for me.

Andy’s article does a nice job of explaining without overwhelming, and his short introduction skillfully avoids going into more detail than most people want. If you’ve ever wanted to explain (or understand) the connection between mashups and APIs, or how Twitter’s massive and rapid success can be attributed to embracing APIs, then this is the article for you.

Read the rest of this entry »


Last November I wrote about Matt, a software developer and graduate student in computer science. Matt had blogged about a deep web crawler he was building. Five months later, I’m curious to know how you’re doing, Matt. Please let us know if you’ve done more work on your crawler since your last post mentioning the subject on November 21.

Read the rest of this entry »