31
Aug

In May, search consultant Avi Rappoport delivered a presentation at the Enterprise Search Summit: Federated vs. Aggregated Search Architectures.

Avi Rappoport is an enterprise search consultant, helping companies improve search engine functionality for websites and intranets. She has a degree from UC Berkeley’s (then) School of Library and Information Science and spent 10 years in software development before becoming a search consultant. She is the editor of SearchTools.com and a frequent speaker and author, providing a strong focus on search usability in the broadest sense and sharing her conviction that search engines can always be better.

Avi created a web page with a summary of and links to a couple of versions of her presentation.

I greatly appreciate Avi’s consideration of the pluses and minuses of federation aggregation (i.e. discovery service) in a world that is often polarized about one approach being better in all cases.

My research for this presentation indicated that each is useful in specific circumstances (I know, no surprise there). Many data sources are obviously best accessed by one or the other, but it’s the corner cases that are tricky. Aspects to consider include:

  • size of the content in the source
  • how often your users need that content
  • content change rate
  • importance of real-time access control permissions changes
  • content licensing rules
  • available tools for indexing / querying
  • difficulty of extracting and indexing
  • quality of the internal search engine
  • difficulty of sending queries and receiving results

The final slide has some sage advice:

Be open-minded, analyze the benefits of each approach for each data source.

One size does NOT fit all.

If you enjoyed this post, make sure you subscribe to the RSS feed!

26
Aug

[ Editor's Note: This is a very touching article by Nena Moss first published in the OSTI Blog. My dad suffered with Alzheimer's for a number of years before he died so I can relate to Nena's experience. Disclaimer: I have been paid to support OSTI in a number of capacities for the past eight years. ]

My mother died in March 2010 after a 15-year battle with Alzheimer’s, so I pay particular attention to news about this dreadful disease. A recent New York Times article caught my eye: “Sharing of Data Leads to Progress on Alzheimer’s.”

How did sharing data lead to progress on Alzheimer’s? A collaborative effort, the Alzheimer’s Disease Neuroimaging Initiative, was formed to find the biological markers that show the progression of Alzheimer’s disease in the human brain. The key was to share all the data, making every finding public immediately – “available to anyone with a computer anywhere in the world.”

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

21
Aug

On federated fetching

Author: Sol

“Federated fetching” is a new term to me. I discovered it at Srinivas Reddy’s Weblog, referencing the O’Reilly book, Beautiful Data:

When we deal with web scale data ‘discoverability’ of information is key. While ‘web search’ provides a lot of value today what we really need is to enable ‘data find data’. I like the differentiation in the book between ‘federated search’ and ‘federated fetch’. The latter needs adaptive systems that can discover new data correlations based on user context and new data collected.

This reference got me curious. Was the Web buzzing with discussion of federated search vs. federated fetch? Not exactly, according to Google, although there are 740 references to the phrase but only 24 of them are considered unique enough for Google to display. Interestingly enough, the first reference is to Jeff Jonas “When Federated Search Bites” article which I wrote about a month ago.

Once a directory reveals a pointer, you can go fetch it. Federated fetch does scale.

Google Books provides the term in the context of the Beautiful Data book:

So, federated fetch is the “end game,” if I understand the concept correctly. It’s what you get when, for example, a link resolver gets you to the full text copy of a book you can actually read.

There you have it, a new phrase I learned today.

If you enjoyed this post, make sure you subscribe to the RSS feed!

9
Aug


Early this year O’Reilly published Search Patterns, by Peter Morville and Jeffery Callender. This is Morville’s fourth information/search-related book. Search Patterns addresses the intersection of user interface and search.

Search Patterns is an absolutely outstanding book. I don’t get excited about search-related books very often but this one totally captivated me. O’Reilly sent me a review copy some months ago. It sat in a pile until I started seeing reviews and references to the book on the Web. The press prompted me to open the book.

The first thing I noticed in flipping through the book was the many high-quality color screen shots and illustrations. Plus, Search Patterns is printed on glossy paper to enhance the visual elements of the book.

At 173 pages (plus index) and a nice balance of text and images, Search Patterns is, at the surface, a quick read. But, there are numerous gems throughout the book so allow yourself plenty of time to read (and reread) sections that draw you.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

30
Jul


The U.S. Census Bureau has a federated search tool in development, Data Ferrett.

The (Beta)DataFerrett helps you locate and retrieve the data you need across the Internet to your desktop or system, regardless of where the data resides.

DataFerrett is a unique data mining and extraction tool. (Beta)DataFerrett allows you to select a databasket full of variables and then recode those variables as you need. You can then develop and customize tables. Selecting your results in your table you can create a chart or graph for a visual presentation into an html page. Save your data in the databasket and save your table for continued reuse.

I have no idea how useful the tool is but their mascot sure is cute!

If you enjoyed this post, make sure you subscribe to the RSS feed!

26
Jul

Huh?

Author: Sol

Jeff Jonas recently published an article, “When Federated Search Bites.” If this article is meant to be link bait, I’m not biting. You can get a link from Google.

I certainly don’t know everything about federated search but I know enough to recognize what’s not federated search, at least not what most of us think to be federated search.

The article, really a rant, starts off reasonably enough:

Federated search: conducting a search against ?n? source systems via a broadcast mechanism without the benefit or guidance of an index.

I am speaking specifically about environments where the systems in the federation are heterogeneous, are physically dispersed, were not engineered for federation a priori, and are not managed by a common command and control system.

Here’s another reasonable statement:

Most organizations have some obligation to make sense of what they know. For example, the airline should know if the person added to the watch list is already an employee or already has a flight reservation. Ideally, the moment such facts become knowable, someone or some system should be notified. Think of this as ?the data speaks to itself.? I call this data finds data.

Yes, having new data trigger analysis is a good idea. But, IT’S NOT FEDERATED SEARCH.

So, the entire basis of the rant is that federated search is not this advanced analysis system I want therefore it sucks. That’s like saying that my oven doesn’t analyze the food I put in it and automatically cook it perfectly therefore my oven “bites.”

There may be a discussion about the challenges of analyzing federated data vs. indexed data but that has nothing to do with what federated search does.

What do you think? Does the article make sense to you?

If you enjoyed this post, make sure you subscribe to the RSS feed!

23
Jul

Federated search as a transformational technology enabling knowledge discovery: the role of WorldWideScience.org” is by far the best historical paper I’ve read about DOE’s Office of Scientific and Technical Information (OSTI), and I consult for the agency.

OSTI has created a number of search portals (WorldWideScience.org, Science,gov, DOE ScienceAccelerator, DOE Energy Citations Database, and DOE Information Bridge to name a few) but few know about the history of the agency that created them.

OSTI grew out of the post-World War II initiative to make the scientific research of the Manhattan Project as freely available to the public as possible. On November 17, 1944, President Roosevelt wrote Vannevar Bush, then the Director of the Office of Scientific Research and Development, to request his counsel on how to capitalize on the experience of the United States’ R&D war efforts — most of which was done in utter secrecy — in the days of peace to come.

OSTI Director Dr. Walter Warnick tells the story of the development of OSTI, its role in advancing science, and how federated search serves that role in ways that Google can’t.

The paper, at 23 pages, covers the subject with a good deal of depth.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

19
Jul

Google just announced that they would buy ITA Software, regulators permitting. Here’s another Google purchase that would take Google deeper into smarter searching.

Semantic processing is taking a big step forward.

From Mashable:

Google Acquires Metaweb to Improve Search

Google has acquired semantic web and real world database company Metaweb, a move the company says will help them ?improve search and make the web richer and more meaningful for everyone.?

We wrote about Metaweb back in 2008 when they received a significant chunk of funding to the tune of $42 million, on top of their first round of $15 million back in 2006. Since then the company has built its Freebase open database into a collection of over 12 million items from entertainment (movies, books, TV shows) to locations, celebrities, companies and other ?real world? objects. Google says the plan is to preserve and further develop the database and hope to enlist other companies to make use of and contribute to the data.

In addition to fleshing out Freebase, Google also hopes to leverage Metaweb to enhance its efforts with features like rich snippets and search answers, both of which aim to give back ?smarter? and more immediate results to specific queries. Right now, simpler requests like ?Barack Obama birthday? and ?events in San Jose? can spawn relevant answers right at the top of the search results page, but Google hopes to take this initiative further by feeding in more facts about the real world from Metaweb?s data repository.

Resource Shelf has some very insightful thoughts on the acquisition.

Here’s a good video on what Metaweb is about:

Here’s Google’s announcement of the acquisition.

If you enjoyed this post, make sure you subscribe to the RSS feed!

15
Jul


Hope Leman is one of my favorite people. I know of very few individuals who are as passionate about anything as is Hope. Hope won second place in our second Federated Search Blog contest and I commented on her passionate review of WorldWideScience.org in 2008.

Hope wrote again about WorldWideScience.org. Her article is at her blog, Signifcant Science. Hope is a research information technologist for a health network in Oregon. She is also Web administrator of the free online grants and scholarship listing service, ScanGrants, and of the free online search platform, ResearchRaven. From several conversations with Hope I know that ScanGrants is a labor of love and a good demonstration of Hope’s passion about helping researchers.

In Multilingual WorldWideScience: Accelerating Scientific Research, Empowering Researchers Hope reminds us of the key role that search plays in research especially in the world of free science and foreign language science.

Hope’s message is personal, and I love that:

As someone who grew up in a family that housed students who had left home and family in China, Japan, Iran, Korea and other countries to study engineering, chemistry, physics, biochemistry and so on at Oregon State University here in my hometown of Corvallis, Oregon I know what brilliant people there are in many countries who have so much to offer and what a boon it will be that the work of researchers worldwide will become useable to each of them and benefit the rest of us.

This update on Hope’s friend who suffered from ALS is even more touching:

I have recently lost a friend to amyotrophic lateral sclerosis and I would often sadly reflect as I bicycled home from her house about the glacial pace of progress on research on that disease and others like it. That is why I find Dr. Warnick’s enthusiasm and practical accomplishments so very admirable and the best possible case for paying one?s taxes with a minimal amount of grumbling. He is putting federal funds to exemplary use

Dr. Warnick, Director of OSTI, conceived WorldWideScience and his agency hosts and manages the search portal.

Databases and search engines aren’t about getting one’s job done. At the noblest level, they’re about solving important problems, and saving lives when we can.

[ Disclaimer: OSTI is one of my consulting clients. Deep Web Technologies, who built the single and multiple language search engines behind WorldWideScience.org and who sponsors this blog is another of my clients. ]

If you enjoyed this post, make sure you subscribe to the RSS feed!

12
Jul

I don’t write about metasearch engines very often but I think that Google’s proposed purchase of ITA Software is worth commenting on. Here’s some info from Google:

On July 1, 2010, Google announced an agreement to acquire ITA Software, a Cambridge, Massachusetts flight information software company, for $700 million, subject to adjustments.

Google’s acquisition of ITA Software will create a new, easier way for users to find better flight information online, which should encourage more users to make their flight purchases online.

The acquisition will benefit passengers, airlines and online travel agencies by making it easier for users to comparison shop for flights and airfares and by driving more potential customers to airlines’ and online travel agencies’ websites. Google won’t be setting airfare prices and has no plans to sell airline tickets to consumers.

Because Google doesn’t currently compete against ITA Software, the deal will not change existing market shares. We are very excited about ITA Software’s QPX business, and we’re looking forward to working with current and future customers. Google will honor all existing agreements, and we’re also enthusiastic about adding new partners.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!