Archive for the "technology" Category

13
Jun

Last Friday blog sponsor Deep Web Technologies released its beta version of multilingual federated search, available at WorldWideScience.org. Deep Web Technologies and several government agencies key to the effort acknowledged the great accomplishment via press releases.

Deep Web Technologies

HELSINKI, June 11 /PRNewswire/ — Deep Web Technologies unveiled multilingual translation capability today for the WorldWideScience Alliance using its federated search application. WorldWideScience.org, the international science portal, is the first application to be deployed with this unique capability. Abe Lederman, President and CTO of Deep Web Technologies, demonstrated the new technology at the International Council for Scientific and Technical Information’s (ICSTI) 2010 Summer Conference in Helsinki. ICSTI is a primary sponsor of the WorldWideScience.org Alliance, whose purpose is to provide “a geographically diverse, governance structure to promote and build upon the original vision of a global science gateway.”

Multilingual federated search translates a user’s search query into the native languages of the collections being searched, aggregates and ranks these results according to relevance, and translates result titles and snippets back to the user’s original language. The translation, powered by Microsoft, makes it simple to search collections in multiple languages from a single search box in the user’s native language. The Conference will include a keynote address by Tony Hey, Corporate Vice President of the External Research Division of Microsoft Research, as well as a presentation by Dr. Walter Warnick, Director of the Office of Scientific and Technical Information of the U.S. Department of Energy Office of Science. (More)

US Department of Energy Office of Science

Washington, D.C.—Scientific language barriers were broken today in Helsinki with the launch of Multilingual WorldWideScience.org. While a large share of scientific literature is published in English, vast quantities of high-quality science are not, and the pace of non-English scientific publishing is increasing. WorldWideScience.org will now enable the first-ever real-time searching and translation across globally-dispersed, multilingual scientific literature using complex translations technology.

“In an increasingly interconnected world, resolving the global challenges of science requires rapid communication of scientific knowledge,” said Dr. William F. Brinkman, Director of the Office of Science, U.S. Department of Energy. “Breaking the language barrier through WorldWideScience.org will help erode borders and build research networks across DOE, the nation, and around the globe.” (More)

DOE Office of Scientific and Technical Information

OAK RIDGE, TN - Now you can find non-English scientific literature from databases in China, Russia, France, and several Latin American countries and have your search results translated into one of nine languages. With the beta launch today (view the Office of Science announcement) of Multilingual WorldWideScience.org, real-time searching and translation of globally-dispersed collections of scientific literature is possible. This new capability is the result of an international public-private partnership between the WorldWideScience.org Alliance and Microsoft Research, whose translation technology has been paired with the federated searching technology of Deep Web Technologies.

Microsoft Research Corporate Vice-President Tony Hey said, “We are extremely pleased to have our Microsoft Translator technology used with WorldWideScience. Built at Microsoft Research, this translation technology already provides translations to millions of users. Partnering with WorldWideScience is an opportunity to advance science across language barriers and improve scientific discovery.” (More)

British Library

World Wide Science Alliance broadens access to global research with the launch of a new multilingual tool, enabling scientists to simultaneously search and translate over 400 million pages of scientific research published in 65 countries from around the world.

Although most scientific literature continues to be published in English, the pace of non-English scientific publishing is increasing rapidly, with vast quantities of high-quality science now being produced every year. Launched today at the International Council for Scientific and Technical Information (ICSTI) annual conference in Helsinki, Finland, a new beta version of WorldWideScience.org will enable scientists to break down the language barrier, facilitating greater global cooperation with regards to the pursuit of scientific research. (More)

If you enjoyed this post, make sure you subscribe to the RSS feed!

26
Mar

Earlier this month ReadWriteWeb reported on a mechanism Google is creating for real-time indexing:

Google is developing a system that will enable web publishers of any size to automatically submit new content to Google for indexing within seconds of that content being published. Search industry analyst Danny Sullivan told us today that this could be “the next chapter” for Google.

And, here’s an interesting comment.

Last Fall we were told by Google’s Brett Slatkin, lead developer on the PubSubHubbub (PuSH) real time syndication protocol, that he hoped Google would some day use PuSH for indexing the web instead of the crawling of links that has been the way search engines have indexed the web for years.

If PuSH is as widely used as Google hopes it will be then this is a major paradigm shift for the search giant. No, Google won’t stop crawling the Web but if a critical mass of Web publishers get Google (and presumably other search engines) to index their content very quickly then the real-time Web will take a giant leap forward.

It will be interesting to see how PuSH impacts the federated search community. Clearly the real-time Web can move scientific information very quickly. Perhaps this new technology and paradigm will augment nicely the flow of scientific papers found by federated search applications in the deep Web.

If you enjoyed this post, make sure you subscribe to the RSS feed!

23
Dec

Multilingual federated search is a big deal for a couple of reasons. First, no one has done it up to now. Yes, Google just added translation into its universal search. And, no doubt Bing will follow suit. But, being able to search the quality scientific and technical information that sometimes is only available via federated search, and doing it in foreign languages, is important.

The second reason that multilingual federated search is so important is because China, Japan, Russia, and other nations produce large volumes of research output. As the world shrinks we can’t afford to ignore the non-English literature. In a blog article the author noted that Thomson Reuters highlighted the importance of China’s research output on the basis of sheer volume :

According to citation analysis based on data from Web of Science, China is ranked second in the world by number of scientific papers published in 2007. Scientific’s World IP Today Report on Global Patent Activity 2007 reported that China almost doubled its volume of patents from 2003 to 2007, and looks set to become a strong rival to Japan and the United States in years to come.

The bottom line: federated search is about research and research is global.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

19
Jul

Discovery services have begun to spring up. This article is my attempt to catalog and characterize them. Consider this article to be an introduction that sets the stage for future analysis articles.

What is a discovery service?

A discovery service is a search interface to pre-indexed meta data and/or full text documents. Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration. I believe that hybrid “federated discovery” services are likely to prevail over pure discovery services and I will dedicate an article to them.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

11
Jun

What is an API?

Author: Sol

Product manager for Blogs.com and lead for their blogger training, Andy Wibbels, wrote an outstanding blog article, “What is an API?” As a programmer I know what an API is but I have a hard time explaining the concept to non-programmers. Now, Andy has done the explaining for me.

Andy’s article does a nice job of explaining without overwhelming, and his short introduction skillfully avoids going into more detail than most people want. If you’ve ever wanted to explain (or understand) the connection between mashups and APIs, or how Twitter’s massive and rapid success can be attributed to embracing APIs, then this is the article for you.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

22
Apr

Last November I wrote about Matt, a software developer and graduate student in computer science. Matt had blogged about a deep web crawler he was building. Five months later, I’m curious to know how you’re doing, Matt. Please let us know if you’ve done more work on your crawler since your last post mentioning the subject on November 21.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

12
Jan

Web 2.0 is a fascination of mine. I’m very community oriented and I’ve watched the computer industry evolve over the past nearly thirty years. I’m very excited about the potential for people and computers to change the world and to help solve our most pressing problems.

Lorcan Dempsey took a look at O’Reilly’s “Programming Collective Intelligence” and he inspired me to look at the book as well. I blogged about Lorcan’s blog article and was able to get a review copy of the book.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

21
Oct

I’m about to add a little bit of AJAX to some software I’m writing so I’m reading up on the technology. So, when my Google alert for “federated search” turned up 5 things you didn’t know about AJAX I was curious to read the article, which appeared in the blog for the Task Force on Social Networking Software for the Medical Library Association. Read my previous article, The interplay between AJAX and federated search, for background material on the intersection of the two subjects. That article introduces AJAX:

AJAX stands for Asynchronous JavaScript and XML. Asynchronous refers to the fact that a program using AJAX can request an update to bits of a web page without having to reload the entire web page. JavaScript provides the mechanism that the web page uses to communicate with the HTTP (web) server. XML is the standard that is sometimes, but certainly not always, used to encode the data given to the web server. AJAX is basically a set of standards and techniques that a web programmer can use to create HTML-based web applications that are browser-independent where parts of the page refresh smoothly without requiring entire page reloads.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

24
Sep

[ Note: Two very huge prime numbers were discovered recently, one last month, the other early this month. These primes, known as Mersenne primes, were discovered via a "divide and conquer" approaches, validating the distributed search approach for fields as distant as federated search :)]

Jonathan Rochkind and Stephan Schmid left comments in response to my article on federating large numbers of sources. I’d like to respond to a part of Jonathan’s comment as well as to a piece of Stephan’s comment.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

19
Sep

This week the Science.gov Alliance released Science.gov 5.0. The release got a good amount of press from a couple of press releases (from the US Department of Energy and Deep Web Technologies) and from a number of bloggers, including Valerie Allen (Product Manager for Science.gov) on OSTI’s own blog and the SLA Government Information Division blog. When I worked for Deep Web, I supported the application and have enjoyed watching it evolve and grow.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!