Uncategorized | Federated Search BlogFederated Search

Archive for the "Uncategorized" Category

27
Jan

On January 8, 2015, Microsoft published a new, Customer Solution Case Study about Deep Web Technologies’ innovative search technology developed in collaboration with the WorldWideScience Alliance.  Using the Microsoft Translation services, the search application case-studyWorldWideScience.org allows users to search in their native language, find results from sources around the world, and read the results translated back into their language. In light of the enormous strides made each year in the global scientific community where timely dissemination of the vast published knowledge is critical, WorldWideScience.org increases access to many important databases and encourages international collaboration.

The WorldWideScience Alliance turned to Abe Lederman, Chief Executive Officer and Chief Technology Officer of Deep Web Technologies, to realize its vision of a better, more automated solution with multilingual support. “We wanted to create an application that would make scholarly material more accessible worldwide to both English and non-English speakers,” he says. “For instance, we wanted a French-speaking user to be able to type in a query and find documents written in any language.”

The Case Study, posted to the Microsoft “Customer Stories” page, comes on the heels of a WorldWideScience.org update in 2014, improving the application look and feel and speed. Additionally, 2015 holds a bright future as the study mentions: “To provide better accessibility, WorldWideScience.org also offers a mobile interface. Deep Web Technologies is launching a streamlined HTML5 version that will work with virtually any device, whether PC, phone, or tablet. Other future enhancements include a localization feature that will provide search portals in the user’s native language.”

In response to the Case Study, Olivier Fontanta, Director of Product Marketing for Microsoft Translator said, “Microsoft Translator can help customers better reach their internal and external stakeholders across languages.  By building on the proven, customizable and scalable Translator API, Deep Web Technologies has developed a solution that has a direct impact on researcher’s ability to learn and exchange with their peers around the world, thereby improving their own research impact.” The Microsoft Translator Team Blog has followed up on the Case Study here.

Oh, and one more thing…WorldWideScience.org is not the only Deep Web Technologies’ multilingual application. WorldWideEnergy translates energy related content into four languages and the United Nations Economic Commission for Africa will be rolling out a multilingual search in 2015.

View the Press Release.

20
Jul

[ Editor’s Note: This post first appeared in the Deep Web Technologies Blog. ]

Government Computer News (GCN) recently reviewed mobile apps developed by the federal government. Science.gov Mobile was among the top 10 listed.

GCN gave the Science.gov Mobile app (which runs on the Android and on the Mobile Web) scores of 7 for usefulness, 8 for ease of use, and 8 for coolness factor.

The Science.gov website has this to say about the accolade:

Coolness? Check. Usefulness? Check. Ease of Use? Check. The Science.gov Mobile application has been named among the Top Ten in Best Federal Apps by Government Computer News (GCN). The recognition is timely, too. The Administration recently issued Digital Government: Building a 21st Century Platform to Better Serve the American People, the strategy which calls on all federal agencies to begin making mobile applications to better serve the American public. GCN called its Top Ten “ahead of the curve” with apps already in place.

I downloaded the application to my Droid X. The install was effortless and the app has a very intuitive user interface, which allows for emailing of search results for later review.

While we didn’t have any involvement in creating the mobile app we did develop the search technology that powers Science.gov as well as the web services API that enables searches by Science.gov Mobile.

We’re quite delighted to see Science.gov serve the mobile web.

26
Apr

The International Journal of Software Engineering & Applications has published the article: “A Federated Search Approach to Facilitate Systematic Literature Review in Software Engineering.” Here’s the abstract:

To impact industry, researchers developing technologies in academia need to provide tangible evidence of the advantages of using them. Nowadays, Systematic Literature Review (SLR) has become a prominent methodology in evidence-based researches. Although adopting SLR in software engineering does not go far in practice, it has been resulted in valuable researches and is going to be more common. However, digital libraries and scientific databases as the best research resources do not provide enough mechanism for SLRs especially in software engineering. On the other hand, any loss of data may change the SLR results and leads to research bias. Accordingly, the search process and evidence collection in SLR is a critical point. This paper provides some tips to enhance the SLR process. The main contribution of this work is presenting a federated search tool which provides an automatic integrated search mechanism in well known Software Engineering databases. Results of case study show that this approach not only reduces required time to do SLR and facilitate its search process, but also improves its reliability and results in the increasing trend to use SLRs.

The article makes a good case for automating the search process to minimize the chance of missing important information in a literature review. The authors’ work in building a customized federated search engine has had three positive results:

1- It considerably reduces required time as one of the most concerns in SLR. It also improves the search process by including synonyms which are provided by an expert domain, automating the search process rather than manually search in every database for every search criteria, and finally integrating multiple databases search results.

2- Its crawler-enabled feature, facilitate search process and automatically save results in a database. After doing some researches, this database will contain thousands of records which not only could be used locally, but also would be so beneficial as a knowledge base for ongoing researches.

3- It facilitates both the qualitative or quantitative analysis on search results while they are integrated in a database. For example, classifying results based on their meta-data fields e.g. authors, may help the researcher to identify duplicated papers.

All in all, a nice article on a nice twist to federated search.

2
Apr

I produced this podcast because I was curious about intelligent web agents and noticed this new edition of Michael Schrenk’s Webbots, Spiders, and Screen Scrapers.

In this podcast, Michael Schrenk and I discuss webbots, spiders, and screen scrapers. These are the tools that allow developers to crawl the web, to mash up contents from multiple web-sites, to monitor sites for activity and to create intelligent agents to make purchases on their behalf. Of particular interest are the stories Mr. Schrenk shares of the intelligent webbots he has built.

Click to listen to or download podcast


Why read Webbots, Spiders and Screen Scrapers?

  1. Gain a bottom-up understanding of what webbots are, how they’re developed, and things to watch out for.
  2. Understand the mind set difference between traditional web development and webbot development
  3. Learn how to get ideas for great webbot projects
  4. Discover how PHP/CURL facilitates advanced file downloads, cookie management and more.
  5. Reenforce what you learn with projects and example scripts
  6. Learn how to leverage WebbotsSpidersScreenScraper_Libraries, the common set of libraries that the book uses to make writing webbots easy.
  7. Learn from the author’s 11 year career of writing webbots and spiders.

About the author

Michael Schrenk has developed webbots for over 17 years, working just about everywhere from Silicon Valley to Moscow, for clients like the BBC, foreign governments, and many Fortune 500 companies. He’s a frequent Defcon speaker and lives in Las Vegas, Nevada.

29
Mar

[ Note: This article was first published in the Deep Web Technologies Blog. ]

Here’s a paper worth reading: “A study of the information search behaviour of the millennial generation.” No, not because there are any earth-shattering conclusions, but you may want to read the article to confirm that what you already suspect to be true really is true. Here’s the introduction from the paper’s abstract:

Introduction. Members of the millennial generation (born after 1982) have come of age in a society infused with technology and information. It is unclear how they determine the validity of information gathered, or whether or not validity is even a concern. Previous information search models based on mediated searches with different age groups may not adequately describe the search behaviours of this generation.

Here’s the conclusion:

Conclusions. These findings indicate that the search behaviour of millennial generation searchers may be problematic. Existing search models are appropriate; it is the execution of the model by the searcher within the context of the search environment that is at issue.

Beyond telling us what we already know the paper gives insights as to how librarians can help students to become more sophisticated researchers. Areas in which librarians can add value include:

  1. Verification of quality of Web information sources
  2. A shift of focus from filtering content to first verifying its quality and then filtering
  3. Developing an orderly methodology for performing research

The paper might provide insights that search engine developers could someday roll into their offerings targeted at students.

14
Feb

Deep Web Technologies president, founder, and CTO Abe Lederman shares some thoughts on discovery services at the Deep Web Technologies Blog.

15
Jun

[ Editor’s note: This article was first published in the Deep Web Technologies Blog. ]

WorldWideScience is a global science gateway that combines national and international scientific databases into a search engine. From a single search form, a scientist, researcher, or curious citizen can search over fifty databases in English and now 22 multilingual sources (with translation to the searcher’s native language) and seven multimedia sources. WorldWideScience is the brainchild of the director of the DOE Office of Scientific and Technical Information (OSTI), Dr. Walt Warnick. The gateway is maintained and hosted by OSTI and governed by the WorldWideScience Alliance.

Deep Web Technologies is proud to have developed the federated search technology behind WorldWideScience. And, with the cooperation of the Microsoft Translation services team, Deep Web Technologies also implemented the multilingual technology. It was a major undertaking but a worthwhile one for the science community, whose members can now greatly expand their reach to scientific papers in languages beyond their own.

Dr. Warnick was invited to deliver a presentation at the 14th session of the United Nations’ Commission on Science and Technology (CSTD). In a post at the OSTI Blog, Dr. Warnick shares the warm reception that WorldWideScience received.

I wish more of my OSTI colleagues could have been in Geneva to share the warm response from the attendees. Several country representatives offered up new sources for WorldWideScience (WWS). Another member of the audience searched mobile WWS for his own name and remarked that he found many of his papers. I received enthusiastic comments, so many that I couldn?t address all of them because of time constraints. Significantly, the Chair of CSTD volunteered to pay the costs of becoming a member of the WorldWideScience Alliance. There was great excitement about the possibilities for its use within the home countries of the attendees and how WWS advances the goals of CSTD.

The paper “Breaking down language barriers through multilingual federated search” co-authored by Abe Lederman (founder and president of Deep Web Technologies), and Dr. Warnick, Brian Hitson, and Lorrie Johnson from OSTI, explains the importance of the gateway:

“WorldWideScience.org (WWS) is a global science gateway developed by the US Department of Energy Office of Scientific and Technical Information (OSTI) in partnership with federated search vendor Deep Web Technologies. WWS provides a simultaneous live search of 69 databases from government and government-sanctioned organizations from 66 participating nations. The WWS portal plays a leading role in bringing together the world’s scientists to accelerate the discoveries needed to solve the planet’s most pressing problems. In this paper we present a brief history of the development of WWS and discuss how a new technology, multilingual federated search, greatly increases WWS’ ability to facilitate the advancement of science.”

Deep Web Technologies is delighted to be working with OSTI and other organizations to push the envelope of search technology and to make the world a smaller place.

1
Apr

This post might seem a bit off topic but it’s not really.

Google is getting on the Kinect bandwagon with the introduction of spatial tracking technology into Gmail.

How it works

Gmail Motion uses your computer’s built-in webcam and Google’s patented spatial tracking technology to detect your movements and translate them into meaningful characters and commands. Movements are designed to be simple and intuitive for people of all skill levels.

More information is at this URL and in this video:
YouTube Preview Image

Wouldn’t it be cool if federated search apps could integrate movement technology so seamlessly!?

20
Feb

I recently discovered an article, 5 Reasons Not to Use Google First, that sings my song. The article addresses this question:

Google is fast, clean and returns more results than any other search engine, but does it really find the information students need for quality academic research? The answer is often ‘no’. “While simply typing words into Google will work for many tasks, academic research demands more.” (Searching for and finding new information – tools, strategies and techniques)

The next paragraph gave me a chuckle.

As far back as 2004, James Morris, Dean of the School of Computer Science at Carnegie Mellon University, coined the term “infobesity,” to describe “the outcome of Google-izing research: a junk-information diet, consisting of overwhelming amounts of low-quality material that is hard to digest and leads to research papers of equally low quality.” (Is Google enough? Comparison of an internet search engine with academic library resources.)

The article continues with its list of five good reasons to not use Google first.

Note that the recommendation isn’t to skip Google altogether. There’s a balance that’s needed to get the best value when performing research. The findings in the “Is Google enough?” article summarizes this point really well:

Google is superior for coverage and accessibility. Library systems are superior for quality of results. Precision is similar for both systems. Good coverage requires use of both, as both have many unique items. Improving the skills of the searcher is likely to give better results from the library systems, but not from Google.

31
Jan

[ This is a republication of the article, “Deep Web Tech in the News: Image Search” that was published in the Deep Web Technologies Blog. Note that Deep Web Technologies sponsors the Federated Search Blog and that I consult for the organization, OSTI, that stewards Science.gov. ]

Deep Web Tech in the News: Image Search

One small step for Science.gov, one giant leap for Federated Search.

“Science.gov is a gateway to more than 42 scientific databases and 200 million pages of science information with just one query, and is a gateway to more than 2,000 scientific websites from 18 organizations within 14 federal science agencies. These agencies represent 97% of the federal R&D budget. Science.gov is the USA.gov portal to science and the U.S. contribution to WorldWideScience.org. Science.gov is hosted by the Department of Energy Office of Scientific and Technical Information, within the Office of Science, and is supported by CENDI, an interagency working group of senior scientific and technical information managers.”

Science.gov received a pretty large upgrade in December, the image search is located under “special collections” and works just like science.gov except the results have thumbnails (www.science.gov/scigovimage/). The search query now quickly pulls back related images from multiple sources into a thumbnail size result. This is one of very few publicly available science image search portals. Cheryl LaGuardia, an industry critic, wrote:

For a free service this works mighty well: my test search for “tornedo” got the reply, “Did you mean “tornado”? with 151 results for the corrected spelling (a test, mind you, or perhaps I’m easing back into work slowly and may have inadvertently misspelled… no matter! The system works!). The resultant images are terrific, compelling enough to send Dorothy pedaling madly down the road away from them on her bicycle, with Toto in tow.

Deep Web Technologies powers the entire website, and we look forward to using this innovation on other projects in the future.