viewpoints | Federated Search BlogFederated Search

Archive for the "viewpoints" Category

11
May

This post was re-published with permission from the Deep Web Technologies Blog. Please view the original post here.

The Beagle Research Group Blog posted “Apple iWatch: What’s the Killer App” on March 10, including this line: “An alert iwatchmight also come from a pre-formed search that could include a constant federated search and data analysis to inform the wearer of a change in the environment that the wearer and only a few others might care about, such as a buy or sell signal for a complex derivative.” While this enticing suggestion is just a snippet in a full post, we thought we’d consider the possibilities this one-liner presents. Could federated search become the next killer app?

Well no, not really. Federated search in and of itself isn’t an application, it’s more of a supporting technology.  It supports real-time searching, rather than indexing, and provides current information on fluxuating information such as weather, stocks, flights, etc.  And that is exactly why it’s killer: Federated Search finds new information of any kind, anywhere, singles out the most precise data to display, and notifies the user to take a look.

In other words, its a great technology for mobile apps to use.  Federated search connects directly to the source of the information, whether medical, energy, academic journals, social media, weather, etc. and finds information as soon as it’s available.  Rather than storing information away, federated search links a person to the data circulating that minute, passing on the newest details as soon as they are available, which makes a huge difference with need-to-know information.  In addition, alerts can be set up to notify the person, researcher, or iWatch wearer of that critical data such as a buy or sell signal as The Beagle Research Group suggests.

Of course, there’s also the issue of real-estate to keep in mind – the iWatch wraps less that 2 inches of display on a wrist.  That’s not much room for a hefty list of information, much less junky results.  What’s important is the single, most accurate piece of information that’s been hand-picked (so to speak) just for you pops up on the screen.  Again, federated search can makes that happen quite easily...it has connections.

There is a world of possibility when it comes to using federated search technology to build applications, whether mobile or for desktop uses. Our on-demand lifestyles require federating, analyzing, and applying all sorts of data, from health, to environment, to social networking. Federated search is not just for librarians finding subscription content anymore.  The next-generation federated search is for everyone in need of information on-the-fly. Don’t worry about missing information (you won’t).  Don’t worry if information is current (it is).  In fact, don’t worry at all. Relax, sit back and get alert notifications to buy that stock, watch the weather driving home, or check out an obscure tweet mentioning one of your hobbies. Your world reports to you what you need to know.  And that, really, is simply killer.

29
Apr

Editor’s Note: This post is re-published with permission from the Deep Web Technologies Blog. This is a guest article by Lisa Brownlee. The 2015 edition of her book, “Intellectual Property Due Diligence in Corporate Transactions: Investment, Risk Assessment and Management”, originally published in 2000, will dive into discussions about using the Deep Web and the Dark Web for Intellectual Property research, emphasizing its importance and usefulness when performing legal due-diligence.

Lisa M. Brownlee is a private consultant and has become an authority on the Deep Web and the Dark Web, particularly as they apply to legal due-diligence. She writes and blogs for Thomson Reuters.  Lisa is an internationally-recognized pioneer on the intersection between digital technologies and law.


 

In this blog post I will delve in some detail into the Deep Web. This expedition will focus exclusively on that part of the Deep Web that excludes the Dark Web.  I cover both Deep Web and Dark Web legal due diligence in more detail in my blog and book, Intellectual Property Due Diligence in Corporate Transactions: Investment, Risk Assessment and Management. In particular, in this article I will discuss the Deep Web as a resource of information for legal due diligence.

When Deep Web Technologies invited me to write this post, I initially intended to primarily delve into the ongoing confusion Binary code and multiple screensregarding Deep Web and Dark Web terminology. The misuse of the terms Deep Web and Dark Web, among other related terms, are problematic from a legal perspective if confusion about those terms spills over into licenses and other contracts and into laws and legal decisions. The terms are so hopelessly intermingled that I decided it is not useful to even attempt untangling them here. In this post, as mentioned, I will specifically cover the Deep Web excluding the Dark Web. The definitions I use are provided in a blog post I wrote on the topic earlier this year, entitled The Deep Web and the Dark Web – Why Lawyers Need to Be Informed.

Deep Web: a treasure trove of and data and other information

The Deep Web is populated with vast amounts of data and other information that are essential to investigate during a legal due diligence in order to find information about a company that is a target for possible licensing, merger or acquisition. A Deep Web (as well as Dark Web) due diligence should be conducted in order to ensure that information relevant to the subject transaction and target company is not missed or misrepresented. Lawyers and financiers conducting the due diligence have essentially two options: conduct the due diligence themselves by visiting each potentially-relevant database and conducting each search individually (potentially ad infinitum), or hire a specialized company such as Deep Web Technologies to design and setup such a search. Hiring an outside firm to conduct such a search saves time and money.

Deep Web data mining is a science that cannot be mastered by lawyers or financiers in a single or a handful of transactions. Using a specialized firm such as DWT has the added benefit of being able to replicate the search on-demand and/or have ongoing updated searches performed. Additionally, DWT can bring multilingual search capacities to investigations—a feature that very few, if any, other data mining companies provide and that would most likely be deficient or entirely missing in a search conducted entirely in-house.

What information is sought in a legal due diligence?

A legal due diligence will investigate a wide and deep variety of topics, from real estate to human resources, to basic corporate finance information, industry and company pricing policies, and environmental compliance. Due diligence nearly always also investigates intellectual property rights of the target company, in a level of detail that is tailored to specific transactions, based on the nature of the company’s goods and/or services. DWT’s Next Generation Federated Search is particularly well-suited for conducting intellectual property investigations.

In sum, the goal of a legal due diligence is to identify and confirm basic information about the target company and determine whether there are any undisclosed infirmities with the target company’s assets and information as presented. In view of these goals, the investing party will require the target company to produce a checklist full of items about the various aspects of the business (and more) discussed above. An abbreviated correlation between the information typically requested in a due diligence and the information that is available in the Deep Web is provided in the chart attached below. In the absence of assistance by Deep Web Technologies with the due diligence, either someone within the investor company or its outside counsel will need to search in each of the databases listed, in addition to others, in order to confirm the information provided by the target company is correct and complete. While representations and warranties are typically given by the target company as to the accuracy and completeness of the information provided, it is also typical for the investing company to confirm all or part of that information, depending on the sensitivities of the transaction and the areas in which the values–and possible risks might be uncovered.

Deep Web Legal Due-Diligence Resource List PDF icon

16
Apr

Abe Lederman, founder and CEO of blog sponsor Deep Web Technologies, recently got a couple of exposures at MobileGroove, a site which provides analysis and commentary on mobile search, mobile advertising, and social media. The two MobileGroove articles cover Deep Web Technologies’ Biznar mobile federated search app.

More at the Deep Web Technologies Blog.

3
Mar

[ This article was originally published in the Deep Web Technologies Blog. ]

The highly regarded Charleston Advisor, known for its “Critical reviews of Web products for Information Professionals,” has given Deep Web Technologies 4 3/8 of 5 possible stars for its Explorit federated search product. The individual scores forming the composite were:

  • Content: 4 1/2 stars
  • User Interface/Searchability: 4 1/2 stars
  • Pricing: 4 1/2 stars
  • Contract Options: 4 stars

The scores were assigned by two reviewers who played a key role in bringing Explorit to Stanford University:

  • Grace Baysinger, Head Librarian and Bibliographer at the Swain Chemistry and Chemical Engineering Library at Stanford University
  • Tom Cramer, Chief Technology Strategist at Stanford University Libraries and Academic Information Resources

Read the rest of this entry »

18
Feb

The Harvard Library Innovation Laboratory at the Harvard Law School posted a link to a 23-minute podcast interview with Sebastian Hammer. Hammer is the president of Index Data, a company in the information retrieval space, including federated search.

Update 4/3/12: A transcript of the interview is here.

Hammer was interviewed about the challenges of federated search, which he addressed in a very balanced way. The gist of Hammer’s message is that, yes, there are challenges to the technology but they’re not insurmountable. And, without using the word “discovery service,” Hammer did a fine job of explaining that large indexes are an important component of a search solution but they’re not the entire solution, especially in organizations that have highly specialized sources they need access to.

I was delighted to hear Hammer mention the idea of “super nodes” to allow federated search to scale to thousands of sources. Blog sponsor Deep Web Technologies has used this idea, which they call hierarchical federated search for several years. Several of their applications search other applications which can, in turn, search other applications. In 2009, Deep Web Technologies founder and president Abe Lederman delivered a talk and presented a paper at SLA,
Science Research: Journey to Ten Thousand Source, detailing his company’s proven “divide-and-conquer” approach to federating federations of sources.

I was also happy to hear Hammer speak to the importance of hybrid solutions. Federation is appropriate for gaining access to some content and maintaining a local index works for other content. Neither alone is a complete solution. Deep Web Technologies figured this out some years ago. A good example of hybrid search technology is the E-print Network, a product of the U.S. Department of Energy’s Office of Scientific and Technical Information, (OSTI). Deep Web Technologies built the search technology, which combines information about millions of documents crawled from over 30,000 sites, with federated content. I have been involved with the crawl piece of the E-print Network for a number of years and can testify to the power of the right hybrid solution. In 2008 I wrote a three-part series of articles at OSTI’s blog explaining the technology behind the E-print Network. Part One is here.

In conclusion, I highly recommend the podcast for a good reminder that federated search isn’t dead and that it’s an important part of search.

1
Jun

On search neutrality

Author: Sol

Abe Lederman, founder and president of Deep Web Technologies and sponsor of this blog, wrote an article at the Deep Web Technologies blog: Preparing for ALA Panel and Federated Search Neutrality. Abe discovered this article at beerbrarian about the problem of net neutrality in federated search.

For those of you not familiar with net neutrality, Wikipedia explains it:

Network neutrality (also net neutrality, Internet neutrality) is a principle which advocates no restrictions by Internet service providers or governments on consumers’ access to networks that participate in the internet. Specifically, network neutrality would prevent restrictions on content, sites, platforms, the kinds of equipment that may be attached, or the modes of communication.
. . .
Neutrality proponents claim that telecom companies seek to impose a tiered service model in order to control the pipeline and thereby remove competition, create artificial scarcity, and oblige subscribers to buy their otherwise uncompetitive services. Many believe net neutrality to be primarily important as a preservation of current freedoms. Vinton Cerf, considered a “father of the Internet” and co-inventor of the Internet Protocol, Tim Berners-Lee, creator of the Web, and many others have spoken out in favor of network neutrality.

In the net neutrality battle, consumers worry about telecom companies unfairly biasing the delivery of some content (that which they have business interest in biasing) over the content of others. Add search to the equation and what you get are concerns over whether your search results are sorted by relevance or by the business needs of the search engine company.

Read the rest of this entry »

20
May

Amusing anecdote

Author: Sol

Miles Kehoe at New Idea Engineering’s Enterprise Search Blog tells an entertaining anecdote.

The folks from Booz & Company, a spinoff from Booz Allen Hamilton, did a presentation on their experience comparing two well respected mainstream search products. They report that, at one point, one of the presenters was looking for a woman she knew named Sarah – but she was having trouble remembering Sarah’s last name. The presenter told of searching one of the engines under evaluation and finding that most of the top 60 people returned from the search were… men. None were named ‘Sue’; and apparently none were named Sarah either. The other engine returned records for a number of women named Sarah; and, as it turns out, for a few men as well.

After some frustration, they finally got to the root of the problem. It turns out that all of the Booz & Company employees have their resumes indexed as part of their profiles. Would you like to guess the name of the person who authored the original resume template? Yep – Sarah.

This is a great example of “garbage in, garbage out!” Meta data is only as good as the humans who curate it (or the machines who try to guess at it.) Thanks for the Friday chuckle, Miles!

5
May

I’ve always thought of personalization as a good thing. If Google knows something about me then it can provide results that I’ll find more relevant, right?

Watch this TED talk by Eli Pariser and, like me, you might start having second thoughts.

Pariser is former executive director of MoveOn and is now a senior fellow at the Roosevelt Institute. His book The Filter Bubble is set for release May 12, 2011. In it, he asks how modern search tools — the filter by which many of see the wider world — are getting better and better and screening the wider world from us, by returning only the search results it “thinks” we want to see.

Here’s the very thought-provoking first paragraph of the talk:

Mark Zuckerberg, a journalist was asking him a question about the news feed. And the journalist was asking him, “Why is this so important?” And Zuckerberg said, “A squirrel dying in your front yard may be more relevant to your interests right now than people dying in Africa.” And I want to talk about what a Web based on that idea of relevance might look like.

Read the rest of this entry »

20
Apr

Here’s a chunk of an interesting article from TechEYE.net: Kids go cold turkey when you take their technology away — Like quitting heroin:

Boffins have found that taking a kid’s computer technology away for a day gives them similar symptoms as going cold turkey.

The study was carried out by the University of Maryland. It found that 79 percent of students subjected to a complete media blackout for just one day reported adverse reactions ranging from distress to confusion and isolation.

One of the things the kids spoke about was having overwhelming cravings while others reported symptoms such as ‘itching’.

The study focused on students aged between 17 and 23 in ten countries. Researchers banned them from using phones, social networking sites, the internet and TV for 24 hours.
The kids could use landline phones or read books and were asked to keep a diary.
One in five reported feelings of withdrawal like an addiction while 11 percent said they were confused. Over 19 percent said they were distressed and 11 percent felt isolated. Some students even reported stress from simply not being able to touch their phone.

I wonder what would happen if all the search engines were turned off for a day.

Hat tip to Stephen Arnold.

20
Feb

I recently discovered an article, 5 Reasons Not to Use Google First, that sings my song. The article addresses this question:

Google is fast, clean and returns more results than any other search engine, but does it really find the information students need for quality academic research? The answer is often ‘no’. “While simply typing words into Google will work for many tasks, academic research demands more.” (Searching for and finding new information – tools, strategies and techniques)

The next paragraph gave me a chuckle.

As far back as 2004, James Morris, Dean of the School of Computer Science at Carnegie Mellon University, coined the term “infobesity,” to describe “the outcome of Google-izing research: a junk-information diet, consisting of overwhelming amounts of low-quality material that is hard to digest and leads to research papers of equally low quality.” (Is Google enough? Comparison of an internet search engine with academic library resources.)

The article continues with its list of five good reasons to not use Google first.

Note that the recommendation isn’t to skip Google altogether. There’s a balance that’s needed to get the best value when performing research. The findings in the “Is Google enough?” article summarizes this point really well:

Google is superior for coverage and accessibility. Library systems are superior for quality of results. Precision is similar for both systems. Good coverage requires use of both, as both have many unique items. Improving the skills of the searcher is likely to give better results from the library systems, but not from Google.