Federated Search BlogFederated Search
11
May

This post was re-published with permission from the Deep Web Technologies Blog. Please view the original post here.

The Beagle Research Group Blog posted “Apple iWatch: What’s the Killer App” on March 10, including this line: “An alert iwatchmight also come from a pre-formed search that could include a constant federated search and data analysis to inform the wearer of a change in the environment that the wearer and only a few others might care about, such as a buy or sell signal for a complex derivative.” While this enticing suggestion is just a snippet in a full post, we thought we’d consider the possibilities this one-liner presents. Could federated search become the next killer app?

Well no, not really. Federated search in and of itself isn’t an application, it’s more of a supporting technology.  It supports real-time searching, rather than indexing, and provides current information on fluxuating information such as weather, stocks, flights, etc.  And that is exactly why it’s killer: Federated Search finds new information of any kind, anywhere, singles out the most precise data to display, and notifies the user to take a look.

In other words, its a great technology for mobile apps to use.  Federated search connects directly to the source of the information, whether medical, energy, academic journals, social media, weather, etc. and finds information as soon as it’s available.  Rather than storing information away, federated search links a person to the data circulating that minute, passing on the newest details as soon as they are available, which makes a huge difference with need-to-know information.  In addition, alerts can be set up to notify the person, researcher, or iWatch wearer of that critical data such as a buy or sell signal as The Beagle Research Group suggests.

Of course, there’s also the issue of real-estate to keep in mind – the iWatch wraps less that 2 inches of display on a wrist.  That’s not much room for a hefty list of information, much less junky results.  What’s important is the single, most accurate piece of information that’s been hand-picked (so to speak) just for you pops up on the screen.  Again, federated search can makes that happen quite easily...it has connections.

There is a world of possibility when it comes to using federated search technology to build applications, whether mobile or for desktop uses. Our on-demand lifestyles require federating, analyzing, and applying all sorts of data, from health, to environment, to social networking. Federated search is not just for librarians finding subscription content anymore.  The next-generation federated search is for everyone in need of information on-the-fly. Don’t worry about missing information (you won’t).  Don’t worry if information is current (it is).  In fact, don’t worry at all. Relax, sit back and get alert notifications to buy that stock, watch the weather driving home, or check out an obscure tweet mentioning one of your hobbies. Your world reports to you what you need to know.  And that, really, is simply killer.

29
Apr

Editor’s Note: This post is re-published with permission from the Deep Web Technologies Blog. This is a guest article by Lisa Brownlee. The 2015 edition of her book, “Intellectual Property Due Diligence in Corporate Transactions: Investment, Risk Assessment and Management”, originally published in 2000, will dive into discussions about using the Deep Web and the Dark Web for Intellectual Property research, emphasizing its importance and usefulness when performing legal due-diligence.

Lisa M. Brownlee is a private consultant and has become an authority on the Deep Web and the Dark Web, particularly as they apply to legal due-diligence. She writes and blogs for Thomson Reuters.  Lisa is an internationally-recognized pioneer on the intersection between digital technologies and law.


 

In this blog post I will delve in some detail into the Deep Web. This expedition will focus exclusively on that part of the Deep Web that excludes the Dark Web.  I cover both Deep Web and Dark Web legal due diligence in more detail in my blog and book, Intellectual Property Due Diligence in Corporate Transactions: Investment, Risk Assessment and Management. In particular, in this article I will discuss the Deep Web as a resource of information for legal due diligence.

When Deep Web Technologies invited me to write this post, I initially intended to primarily delve into the ongoing confusion Binary code and multiple screensregarding Deep Web and Dark Web terminology. The misuse of the terms Deep Web and Dark Web, among other related terms, are problematic from a legal perspective if confusion about those terms spills over into licenses and other contracts and into laws and legal decisions. The terms are so hopelessly intermingled that I decided it is not useful to even attempt untangling them here. In this post, as mentioned, I will specifically cover the Deep Web excluding the Dark Web. The definitions I use are provided in a blog post I wrote on the topic earlier this year, entitled The Deep Web and the Dark Web – Why Lawyers Need to Be Informed.

Deep Web: a treasure trove of and data and other information

The Deep Web is populated with vast amounts of data and other information that are essential to investigate during a legal due diligence in order to find information about a company that is a target for possible licensing, merger or acquisition. A Deep Web (as well as Dark Web) due diligence should be conducted in order to ensure that information relevant to the subject transaction and target company is not missed or misrepresented. Lawyers and financiers conducting the due diligence have essentially two options: conduct the due diligence themselves by visiting each potentially-relevant database and conducting each search individually (potentially ad infinitum), or hire a specialized company such as Deep Web Technologies to design and setup such a search. Hiring an outside firm to conduct such a search saves time and money.

Deep Web data mining is a science that cannot be mastered by lawyers or financiers in a single or a handful of transactions. Using a specialized firm such as DWT has the added benefit of being able to replicate the search on-demand and/or have ongoing updated searches performed. Additionally, DWT can bring multilingual search capacities to investigations—a feature that very few, if any, other data mining companies provide and that would most likely be deficient or entirely missing in a search conducted entirely in-house.

What information is sought in a legal due diligence?

A legal due diligence will investigate a wide and deep variety of topics, from real estate to human resources, to basic corporate finance information, industry and company pricing policies, and environmental compliance. Due diligence nearly always also investigates intellectual property rights of the target company, in a level of detail that is tailored to specific transactions, based on the nature of the company’s goods and/or services. DWT’s Next Generation Federated Search is particularly well-suited for conducting intellectual property investigations.

In sum, the goal of a legal due diligence is to identify and confirm basic information about the target company and determine whether there are any undisclosed infirmities with the target company’s assets and information as presented. In view of these goals, the investing party will require the target company to produce a checklist full of items about the various aspects of the business (and more) discussed above. An abbreviated correlation between the information typically requested in a due diligence and the information that is available in the Deep Web is provided in the chart attached below. In the absence of assistance by Deep Web Technologies with the due diligence, either someone within the investor company or its outside counsel will need to search in each of the databases listed, in addition to others, in order to confirm the information provided by the target company is correct and complete. While representations and warranties are typically given by the target company as to the accuracy and completeness of the information provided, it is also typical for the investing company to confirm all or part of that information, depending on the sensitivities of the transaction and the areas in which the values–and possible risks might be uncovered.

Deep Web Legal Due-Diligence Resource List PDF icon

27
Jan

On January 8, 2015, Microsoft published a new, Customer Solution Case Study about Deep Web Technologies’ innovative search technology developed in collaboration with the WorldWideScience Alliance.  Using the Microsoft Translation services, the search application case-studyWorldWideScience.org allows users to search in their native language, find results from sources around the world, and read the results translated back into their language. In light of the enormous strides made each year in the global scientific community where timely dissemination of the vast published knowledge is critical, WorldWideScience.org increases access to many important databases and encourages international collaboration.

The WorldWideScience Alliance turned to Abe Lederman, Chief Executive Officer and Chief Technology Officer of Deep Web Technologies, to realize its vision of a better, more automated solution with multilingual support. “We wanted to create an application that would make scholarly material more accessible worldwide to both English and non-English speakers,” he says. “For instance, we wanted a French-speaking user to be able to type in a query and find documents written in any language.”

The Case Study, posted to the Microsoft “Customer Stories” page, comes on the heels of a WorldWideScience.org update in 2014, improving the application look and feel and speed. Additionally, 2015 holds a bright future as the study mentions: “To provide better accessibility, WorldWideScience.org also offers a mobile interface. Deep Web Technologies is launching a streamlined HTML5 version that will work with virtually any device, whether PC, phone, or tablet. Other future enhancements include a localization feature that will provide search portals in the user’s native language.”

In response to the Case Study, Olivier Fontanta, Director of Product Marketing for Microsoft Translator said, “Microsoft Translator can help customers better reach their internal and external stakeholders across languages.  By building on the proven, customizable and scalable Translator API, Deep Web Technologies has developed a solution that has a direct impact on researcher’s ability to learn and exchange with their peers around the world, thereby improving their own research impact.” The Microsoft Translator Team Blog has followed up on the Case Study here.

Oh, and one more thing…WorldWideScience.org is not the only Deep Web Technologies’ multilingual application. WorldWideEnergy translates energy related content into four languages and the United Nations Economic Commission for Africa will be rolling out a multilingual search in 2015.

View the Press Release.

20
Jul

[ Editor’s Note: This post first appeared in the Deep Web Technologies Blog. ]

Government Computer News (GCN) recently reviewed mobile apps developed by the federal government. Science.gov Mobile was among the top 10 listed.

GCN gave the Science.gov Mobile app (which runs on the Android and on the Mobile Web) scores of 7 for usefulness, 8 for ease of use, and 8 for coolness factor.

The Science.gov website has this to say about the accolade:

Coolness? Check. Usefulness? Check. Ease of Use? Check. The Science.gov Mobile application has been named among the Top Ten in Best Federal Apps by Government Computer News (GCN). The recognition is timely, too. The Administration recently issued Digital Government: Building a 21st Century Platform to Better Serve the American People, the strategy which calls on all federal agencies to begin making mobile applications to better serve the American public. GCN called its Top Ten “ahead of the curve” with apps already in place.

I downloaded the application to my Droid X. The install was effortless and the app has a very intuitive user interface, which allows for emailing of search results for later review.

While we didn’t have any involvement in creating the mobile app we did develop the search technology that powers Science.gov as well as the web services API that enables searches by Science.gov Mobile.

We’re quite delighted to see Science.gov serve the mobile web.

28
May

Dr. Karl Kochendorfer makes a compelling case for federated search in the healthcare industry. As a family physician and leader in the effort to connect healthcare workers to the information they need, Dr. Kochendorfer acknowledges what those of us in the federated search world already know – Google and the surface web contain so little of the critical information your doctor and his staff need to support important medical decision-making.

Dr. Kochendorfer delivered a TEDX talk in April: “Seek and Ye Shall Find,” explaining the problem and solution:

Some highlights from the talk:

  1. There are 3 billion terabytes of information out there.
  2. There are 700,000 articles added to the medical literature every year.
  3. Information overload was described 140 years ago by a German surgeon: “It has become increasingly difficult to keep abreast of the reports which accumulate day after day … one suffocates through exposure to the massive body of rapidly growing information.”
  4. With better search tools, 275 million improved decisions could be made.
  5. Clinicians spend 1/3 of their time looking for information.

And, the most compelling reason to get federated search into healthcare is the sobering thought by Dr. Kochendorfer that doctors are now starting to use Wikipedia to get answers to their questions instead of the best evidence-based sources out there just because Wikipedia is so easy for them to use. Scary.

26
Apr

The International Journal of Software Engineering & Applications has published the article: “A Federated Search Approach to Facilitate Systematic Literature Review in Software Engineering.” Here’s the abstract:

To impact industry, researchers developing technologies in academia need to provide tangible evidence of the advantages of using them. Nowadays, Systematic Literature Review (SLR) has become a prominent methodology in evidence-based researches. Although adopting SLR in software engineering does not go far in practice, it has been resulted in valuable researches and is going to be more common. However, digital libraries and scientific databases as the best research resources do not provide enough mechanism for SLRs especially in software engineering. On the other hand, any loss of data may change the SLR results and leads to research bias. Accordingly, the search process and evidence collection in SLR is a critical point. This paper provides some tips to enhance the SLR process. The main contribution of this work is presenting a federated search tool which provides an automatic integrated search mechanism in well known Software Engineering databases. Results of case study show that this approach not only reduces required time to do SLR and facilitate its search process, but also improves its reliability and results in the increasing trend to use SLRs.

The article makes a good case for automating the search process to minimize the chance of missing important information in a literature review. The authors’ work in building a customized federated search engine has had three positive results:

1- It considerably reduces required time as one of the most concerns in SLR. It also improves the search process by including synonyms which are provided by an expert domain, automating the search process rather than manually search in every database for every search criteria, and finally integrating multiple databases search results.

2- Its crawler-enabled feature, facilitate search process and automatically save results in a database. After doing some researches, this database will contain thousands of records which not only could be used locally, but also would be so beneficial as a knowledge base for ongoing researches.

3- It facilitates both the qualitative or quantitative analysis on search results while they are integrated in a database. For example, classifying results based on their meta-data fields e.g. authors, may help the researcher to identify duplicated papers.

All in all, a nice article on a nice twist to federated search.

16
Apr

Abe Lederman, founder and CEO of blog sponsor Deep Web Technologies, recently got a couple of exposures at MobileGroove, a site which provides analysis and commentary on mobile search, mobile advertising, and social media. The two MobileGroove articles cover Deep Web Technologies’ Biznar mobile federated search app.

More at the Deep Web Technologies Blog.

2
Apr

I produced this podcast because I was curious about intelligent web agents and noticed this new edition of Michael Schrenk’s Webbots, Spiders, and Screen Scrapers.

In this podcast, Michael Schrenk and I discuss webbots, spiders, and screen scrapers. These are the tools that allow developers to crawl the web, to mash up contents from multiple web-sites, to monitor sites for activity and to create intelligent agents to make purchases on their behalf. Of particular interest are the stories Mr. Schrenk shares of the intelligent webbots he has built.


Click to listen to or download podcast


Why read Webbots, Spiders and Screen Scrapers?

  1. Gain a bottom-up understanding of what webbots are, how they’re developed, and things to watch out for.
  2. Understand the mind set difference between traditional web development and webbot development
  3. Learn how to get ideas for great webbot projects
  4. Discover how PHP/CURL facilitates advanced file downloads, cookie management and more.
  5. Reenforce what you learn with projects and example scripts
  6. Learn how to leverage WebbotsSpidersScreenScraper_Libraries, the common set of libraries that the book uses to make writing webbots easy.
  7. Learn from the author’s 11 year career of writing webbots and spiders.


About the author

Michael Schrenk has developed webbots for over 17 years, working just about everywhere from Silicon Valley to Moscow, for clients like the BBC, foreign governments, and many Fortune 500 companies. He’s a frequent Defcon speaker and lives in Las Vegas, Nevada.

29
Mar

[ Note: This article was first published in the Deep Web Technologies Blog. ]

Here’s a paper worth reading: “A study of the information search behaviour of the millennial generation.” No, not because there are any earth-shattering conclusions, but you may want to read the article to confirm that what you already suspect to be true really is true. Here’s the introduction from the paper’s abstract:

Introduction. Members of the millennial generation (born after 1982) have come of age in a society infused with technology and information. It is unclear how they determine the validity of information gathered, or whether or not validity is even a concern. Previous information search models based on mediated searches with different age groups may not adequately describe the search behaviours of this generation.

Here’s the conclusion:

Conclusions. These findings indicate that the search behaviour of millennial generation searchers may be problematic. Existing search models are appropriate; it is the execution of the model by the searcher within the context of the search environment that is at issue.

Beyond telling us what we already know the paper gives insights as to how librarians can help students to become more sophisticated researchers. Areas in which librarians can add value include:

  1. Verification of quality of Web information sources
  2. A shift of focus from filtering content to first verifying its quality and then filtering
  3. Developing an orderly methodology for performing research

The paper might provide insights that search engine developers could someday roll into their offerings targeted at students.

21
Mar

[Editor’s Note: I received this email from Azhar Jassal at sehrch.com. I like what he’s up to so I thought I’d give him a plug by republishing his letter, with Azhar’s permission.]


Hi

I wanted to make you aware of a new search engine that I have spent the last 15 months building: sehrch.com

This is a new breed of search engine, it is a “structured search” engine. This type of search engine queries both the document web and the semantic web harmoniously. I have developed a simple query language that allows a user to intertwine between both of these worlds.

The purpose of Sehrch.com is to complete a users overall information retrieval task in as short time as possible by providing the most informative entity centric result. This is accomplished by either accepting an unstructured query (just how mainstream search engines are used) and applying conceptual awareness or by making structured queries, something all current mainstream search engines are incapable of doing (as they only concern themselves with the document web/ not the semantic web), which in my opinion adds a whole new dimension to information retrieval systems.

Read the rest of this entry »