federated search | Federated Search BlogFederated Search
18
Feb

The Harvard Library Innovation Laboratory at the Harvard Law School posted a link to a 23-minute podcast interview with Sebastian Hammer. Hammer is the president of Index Data, a company in the information retrieval space, including federated search.

Update 4/3/12: A transcript of the interview is here.

Hammer was interviewed about the challenges of federated search, which he addressed in a very balanced way. The gist of Hammer’s message is that, yes, there are challenges to the technology but they’re not insurmountable. And, without using the word “discovery service,” Hammer did a fine job of explaining that large indexes are an important component of a search solution but they’re not the entire solution, especially in organizations that have highly specialized sources they need access to.

I was delighted to hear Hammer mention the idea of “super nodes” to allow federated search to scale to thousands of sources. Blog sponsor Deep Web Technologies has used this idea, which they call hierarchical federated search for several years. Several of their applications search other applications which can, in turn, search other applications. In 2009, Deep Web Technologies founder and president Abe Lederman delivered a talk and presented a paper at SLA,
Science Research: Journey to Ten Thousand Source, detailing his company’s proven “divide-and-conquer” approach to federating federations of sources.

I was also happy to hear Hammer speak to the importance of hybrid solutions. Federation is appropriate for gaining access to some content and maintaining a local index works for other content. Neither alone is a complete solution. Deep Web Technologies figured this out some years ago. A good example of hybrid search technology is the E-print Network, a product of the U.S. Department of Energy’s Office of Scientific and Technical Information, (OSTI). Deep Web Technologies built the search technology, which combines information about millions of documents crawled from over 30,000 sites, with federated content. I have been involved with the crawl piece of the E-print Network for a number of years and can testify to the power of the right hybrid solution. In 2008 I wrote a three-part series of articles at OSTI’s blog explaining the technology behind the E-print Network. Part One is here.

In conclusion, I highly recommend the podcast for a good reminder that federated search isn’t dead and that it’s an important part of search.

1
Jul

[ Editor’s note: Blog sponsor Deep Web Technologies has announced important enhancements to its federated search technology that allows its Explorit Research Accelerator product to go deeper into the deep Web than ever before. ]

Researchers can now search text, audio, video and images in multiple languages

SANTA FE, N.M., June 21, 2011 /PRNewswire/ — Deep Web Technologies?, the leader in federated search of the Deep Web, today announced full integration of multilingual and multimedia search into the company’s market-leading Explorit? Research Accelerator. The patent-pending multilingual search capability is the first such feature ever offered for Deep Web search.

Multilingual federated search, unveiled June 11, 2011 in Helsinki at the International Council for Scientific and Technical Information’s Summer Conference and originally only available as a beta release to users of the WorldWideScience.org gateway to global science, is now available to all Deep Web Technologies customers who require seamless access to foreign language documents. Explorit’s multilingual search capability translates a user’s search query into the native languages of the collections being searched, aggregates and ranks these results according to relevance, and translates result titles and snippets back to the user’s original language. The multilingual translation functionality, powered by Microsoft?, makes it simple to search collections in multiple languages from a single search box in the user’s native language.

Multimedia federated search, first introduced in the WorldWideScience.org and ScienceAccelerator.gov portals, allows for seamless integration of audio, video, and image content sources into Explorit. WorldWideScience.org searches seven multimedia sources: CDC Podcasts, CERN Multimedia, Medline Plus, NASA, NSF, NBII LIFE, and ScienceCinema. ScienceCinema is an exciting example of the ability to search speech indexed multimedia content. The DOE Office of Scientific and Technical Information (OSTI) developed ScienceCinema in partnership with Microsoft. When multimedia sources are included in an Explorit search, images and links to multimedia content can be presented alongside text results or in a separate results tab.

More

1
Jun

On search neutrality

Author: Sol

Abe Lederman, founder and president of Deep Web Technologies and sponsor of this blog, wrote an article at the Deep Web Technologies blog: Preparing for ALA Panel and Federated Search Neutrality. Abe discovered this article at beerbrarian about the problem of net neutrality in federated search.

For those of you not familiar with net neutrality, Wikipedia explains it:

Network neutrality (also net neutrality, Internet neutrality) is a principle which advocates no restrictions by Internet service providers or governments on consumers’ access to networks that participate in the internet. Specifically, network neutrality would prevent restrictions on content, sites, platforms, the kinds of equipment that may be attached, or the modes of communication.
. . .
Neutrality proponents claim that telecom companies seek to impose a tiered service model in order to control the pipeline and thereby remove competition, create artificial scarcity, and oblige subscribers to buy their otherwise uncompetitive services. Many believe net neutrality to be primarily important as a preservation of current freedoms. Vinton Cerf, considered a “father of the Internet” and co-inventor of the Internet Protocol, Tim Berners-Lee, creator of the Web, and many others have spoken out in favor of network neutrality.

In the net neutrality battle, consumers worry about telecom companies unfairly biasing the delivery of some content (that which they have business interest in biasing) over the content of others. Add search to the equation and what you get are concerns over whether your search results are sorted by relevance or by the business needs of the search engine company.

Read the rest of this entry »

20
May

Amusing anecdote

Author: Sol

Miles Kehoe at New Idea Engineering’s Enterprise Search Blog tells an entertaining anecdote.

The folks from Booz & Company, a spinoff from Booz Allen Hamilton, did a presentation on their experience comparing two well respected mainstream search products. They report that, at one point, one of the presenters was looking for a woman she knew named Sarah – but she was having trouble remembering Sarah’s last name. The presenter told of searching one of the engines under evaluation and finding that most of the top 60 people returned from the search were… men. None were named ‘Sue’; and apparently none were named Sarah either. The other engine returned records for a number of women named Sarah; and, as it turns out, for a few men as well.

After some frustration, they finally got to the root of the problem. It turns out that all of the Booz & Company employees have their resumes indexed as part of their profiles. Would you like to guess the name of the person who authored the original resume template? Yep – Sarah.

This is a great example of “garbage in, garbage out!” Meta data is only as good as the humans who curate it (or the machines who try to guess at it.) Thanks for the Friday chuckle, Miles!

5
May

I’ve always thought of personalization as a good thing. If Google knows something about me then it can provide results that I’ll find more relevant, right?

Watch this TED talk by Eli Pariser and, like me, you might start having second thoughts.

Pariser is former executive director of MoveOn and is now a senior fellow at the Roosevelt Institute. His book The Filter Bubble is set for release May 12, 2011. In it, he asks how modern search tools — the filter by which many of see the wider world — are getting better and better and screening the wider world from us, by returning only the search results it “thinks” we want to see.

Here’s the very thought-provoking first paragraph of the talk:

Mark Zuckerberg, a journalist was asking him a question about the news feed. And the journalist was asking him, “Why is this so important?” And Zuckerberg said, “A squirrel dying in your front yard may be more relevant to your interests right now than people dying in Africa.” And I want to talk about what a Web based on that idea of relevance might look like.

Read the rest of this entry »

20
Apr

Here’s a chunk of an interesting article from TechEYE.net: Kids go cold turkey when you take their technology away — Like quitting heroin:

Boffins have found that taking a kid’s computer technology away for a day gives them similar symptoms as going cold turkey.

The study was carried out by the University of Maryland. It found that 79 percent of students subjected to a complete media blackout for just one day reported adverse reactions ranging from distress to confusion and isolation.

One of the things the kids spoke about was having overwhelming cravings while others reported symptoms such as ‘itching’.

The study focused on students aged between 17 and 23 in ten countries. Researchers banned them from using phones, social networking sites, the internet and TV for 24 hours.
The kids could use landline phones or read books and were asked to keep a diary.
One in five reported feelings of withdrawal like an addiction while 11 percent said they were confused. Over 19 percent said they were distressed and 11 percent felt isolated. Some students even reported stress from simply not being able to touch their phone.

I wonder what would happen if all the search engines were turned off for a day.

Hat tip to Stephen Arnold.

22
Mar

Articles on Discovery lists a number of categorized resources about discovery services. Categories include:

  • Basics
  • Historical
  • Presentations by vendors
  • Debates
  • Library experiences, evaluations & case studies
  • Misc comments on other issues
  • Wikis/rough notes
  • Hacking
  • Webcasts (not free)

This great resource page includes more than fifty links to articles on the subject.

I highly recommend that libraries interested in discovery services give extra attention to the articles in the debates and library experiences sections so that they can learn about the technology with their eyes wide open.

You can access my writings about discovery services here.

28
Feb

I recently got this question:

I’m new to federated search. You’ve written lots of articles (too many) about the subject. Can you give me a half dozen articles to read that would get me oriented? Oh, and if you would tell me what order to read them in that would be great too!

I took this request to heart and came up with my ordered list of basic articles. The list has 15 articles. Yes, the request was for six. My only defense is that many of the articles are a quick read and, since my list is in order, you can read just the first six and you’ll know a lot more than when you started.

I organize my list into three sections: federated search/deep web, discovery services, and federated search in the enterprise. I think everyone new to federated search needs to have an awareness of all three areas.

Here’s my list:

Federated Search and the Deep Web

Discovery Services

Federated Search in the Enterprise

20
Feb

I recently discovered an article, 5 Reasons Not to Use Google First, that sings my song. The article addresses this question:

Google is fast, clean and returns more results than any other search engine, but does it really find the information students need for quality academic research? The answer is often ‘no’. “While simply typing words into Google will work for many tasks, academic research demands more.” (Searching for and finding new information – tools, strategies and techniques)

The next paragraph gave me a chuckle.

As far back as 2004, James Morris, Dean of the School of Computer Science at Carnegie Mellon University, coined the term “infobesity,” to describe “the outcome of Google-izing research: a junk-information diet, consisting of overwhelming amounts of low-quality material that is hard to digest and leads to research papers of equally low quality.” (Is Google enough? Comparison of an internet search engine with academic library resources.)

The article continues with its list of five good reasons to not use Google first.

Note that the recommendation isn’t to skip Google altogether. There’s a balance that’s needed to get the best value when performing research. The findings in the “Is Google enough?” article summarizes this point really well:

Google is superior for coverage and accessibility. Library systems are superior for quality of results. Precision is similar for both systems. Good coverage requires use of both, as both have many unique items. Improving the skills of the searcher is likely to give better results from the library systems, but not from Google.

15
Feb

I learned, from Roy Tennant, about work that Microsoft and others are doing with natural user interfaces (NUIs). What’s an NUI? Here’s a piece of a Microsoft Blog article that gives you the gist:

One product that has gotten a lot of attention recently is our Kinect for Xbox 360, which incorporates facial recognition along with gesture-based and voice control. The device knows who you are, understands your voice or the wave of your hand and is changing the face of gaming as we know it. …

By combining sensory inputs with the knowledge of what you?re trying to do (contextual awareness), where you are and what is around you (environmental awareness), 3D simulation and anticipatory learning, we can foresee a future where technology becomes almost invisible. Imagine a world where interacting with technology becomes as easy as having a conversation with a friend.

I can’t quite fathom what search would like in a world of NUI but I’m looking forward to it.