Archive for the "Uncategorized" Category

31
Aug

In May, search consultant Avi Rappoport delivered a presentation at the Enterprise Search Summit: Federated vs. Aggregated Search Architectures.

Avi Rappoport is an enterprise search consultant, helping companies improve search engine functionality for websites and intranets. She has a degree from UC Berkeley’s (then) School of Library and Information Science and spent 10 years in software development before becoming a search consultant. She is the editor of SearchTools.com and a frequent speaker and author, providing a strong focus on search usability in the broadest sense and sharing her conviction that search engines can always be better.

Avi created a web page with a summary of and links to a couple of versions of her presentation.

I greatly appreciate Avi’s consideration of the pluses and minuses of federation aggregation (i.e. discovery service) in a world that is often polarized about one approach being better in all cases.

My research for this presentation indicated that each is useful in specific circumstances (I know, no surprise there). Many data sources are obviously best accessed by one or the other, but it’s the corner cases that are tricky. Aspects to consider include:

  • size of the content in the source
  • how often your users need that content
  • content change rate
  • importance of real-time access control permissions changes
  • content licensing rules
  • available tools for indexing / querying
  • difficulty of extracting and indexing
  • quality of the internal search engine
  • difficulty of sending queries and receiving results

The final slide has some sage advice:

Be open-minded, analyze the benefits of each approach for each data source.

One size does NOT fit all.

If you enjoyed this post, make sure you subscribe to the RSS feed!

26
Aug

[ Editor's Note: This is a very touching article by Nena Moss first published in the OSTI Blog. My dad suffered with Alzheimer's for a number of years before he died so I can relate to Nena's experience. Disclaimer: I have been paid to support OSTI in a number of capacities for the past eight years. ]

My mother died in March 2010 after a 15-year battle with Alzheimer’s, so I pay particular attention to news about this dreadful disease. A recent New York Times article caught my eye: “Sharing of Data Leads to Progress on Alzheimer’s.”

How did sharing data lead to progress on Alzheimer’s? A collaborative effort, the Alzheimer’s Disease Neuroimaging Initiative, was formed to find the biological markers that show the progression of Alzheimer’s disease in the human brain. The key was to share all the data, making every finding public immediately – “available to anyone with a computer anywhere in the world.”

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

9
Aug


Early this year O’Reilly published Search Patterns, by Peter Morville and Jeffery Callender. This is Morville’s fourth information/search-related book. Search Patterns addresses the intersection of user interface and search.

Search Patterns is an absolutely outstanding book. I don’t get excited about search-related books very often but this one totally captivated me. O’Reilly sent me a review copy some months ago. It sat in a pile until I started seeing reviews and references to the book on the Web. The press prompted me to open the book.

The first thing I noticed in flipping through the book was the many high-quality color screen shots and illustrations. Plus, Search Patterns is printed on glossy paper to enhance the visual elements of the book.

At 173 pages (plus index) and a nice balance of text and images, Search Patterns is, at the surface, a quick read. But, there are numerous gems throughout the book so allow yourself plenty of time to read (and reread) sections that draw you.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

12
Jul

I don’t write about metasearch engines very often but I think that Google’s proposed purchase of ITA Software is worth commenting on. Here’s some info from Google:

On July 1, 2010, Google announced an agreement to acquire ITA Software, a Cambridge, Massachusetts flight information software company, for $700 million, subject to adjustments.

Google’s acquisition of ITA Software will create a new, easier way for users to find better flight information online, which should encourage more users to make their flight purchases online.

The acquisition will benefit passengers, airlines and online travel agencies by making it easier for users to comparison shop for flights and airfares and by driving more potential customers to airlines’ and online travel agencies’ websites. Google won’t be setting airfare prices and has no plans to sell airline tickets to consumers.

Because Google doesn’t currently compete against ITA Software, the deal will not change existing market shares. We are very excited about ITA Software’s QPX business, and we’re looking forward to working with current and future customers. Google will honor all existing agreements, and we’re also enthusiastic about adding new partners.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

4
Jun

Clusters that think

Author: Sol

[ Editor's Note: I'm republishing this article, by Brian DeSpain, from the Deep Web Technologies Blog. It does great job of explaining how their clustering solution adds value to federated search. ]

Clusters that think

One of the most interesting features of our Explorit search product is our clustering engine, which analyzes results and produces “clusters” that represent a new and powerful way to navigate search results. The true power of these clusters is often overlooked, for they superficially resemble the output generated by the keyword-based systems and fixed taxonomies of other search engines. Our clustering technology, however, is more akin to a document-discovery engine, which provides a significant improvement over the alternatives in the library world.

The Explorit engine provides a unique approach to clustering taken from Latent Semantic Analysis (or LSA). We took a look at some of the traditional methods at taxonomy generation (i.e. learning approaches, semantic knowledge bases, and word nets) and after carefully examining their advantages and shortcomings, we chose latent semantic analysis, and a “description comes first” approach, to provide a rich result analysis tool for customers. LSA is a fully automatic mathematical/statistical technique for extracting and inferring relations of contextual usage of words in search results. This technology provides a concept-based approach to analyzing and clustering results from a result set. Applying the LSA approach, our clustering engine analyzes the relationships between a set of documents and the terms contained within the documents to produce a set of concepts related to the results. In other words, our search engines can generate more sophisticated and nuanced result clusters, which will help to cut down on the time and tries it takes for users to find the desired information.
Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

1
Jun

WebSearch University’s Fall conference has some workshops (September 26) and talks (September 27 and 28) that readers of this blog might appreciate:

Delving Into Deep Web Business Resources

Marydee Ojala, ONLINE Magazine

Anyone approaching business research today needs to understand the wealth of information available on the deep, invisible web. To effectively and efficiently find data on companies, industries, markets, and management, you should consult specialized as well as general search engines; exploit social media resources; choose to search directories, groups, portals, images, blogs, feeds, wikis, and statistical files; consider fee-based tools; and concentrate on effectively conceptualizing. This seminar, taught by an experienced business searcher, will concentrate on resources but will also include practical techniques for using these resources.

Government Tools & Sites

Laura Gordon-Murnane, Library, BNA

It’s no secret that the U.S. government is a prolific publisher. With a new administration comes a new attitude toward information transparency and disclosure that affects not only federal government information, but also filters down to the state and local levels. The implications for searchers are vast. If you ever thought government data was boring, dull, or lackluster, this session will open your eyes to exciting opportunities of maximizing the value of government information.

Social Networking and Real-Time Research

Marydee Ojala, ONLINE Magazine

Use of social networking sites such as Facebook, Twitter, and LinkedIn has skyrocketed in the past year. As a global phenomenon, millions of people use social media to generate content, share ideas, and keep in touch with family, friends, work colleagues, companies, associations, and causes. They can be a source and tool for research. Marydee Ojala will address the where, when, and how aspects of social networking research, including authenticity, trust, and information overload, along with some real-world caveats.

Semantic Search Engines

Tamas Doszkocs, Specialized Information Services Division, National Library of Medicine

New generations of search engines are not just on the horizon, they’re here. Semantic search engines go far beyond keywords, using a variety of signals and
behavioral analysis to understand the intent of your search. This presentation, by a noted computer scientist at the National Library of Medicine, will demonstrate
the basics of semantic search as they apply to an innovative federated search solution. Semantic searching is utilized at every step of the process, including automatic query enhancement, semantic search result clustering, and information mashups.

If you enjoyed this post, make sure you subscribe to the RSS feed!

10
May

[ Editor's note: This article first appeared in the OSTI Blog. Dr. Walt Warnick, Director of the Office of Scientific and Technical Information, part of DOE, and I co-authored the article. For some important search applications there is no alternative to federated search.]

Discovery services have begun to appear in the search landscape. Discovery services provide access to documents from publishers with which they have relationships by indexing the publishers’ metadata and/or full text. Discovery services are marketed to libraries where patrons appreciate near-instantaneous search results and where library staff is willing to restrict access to sources available from the service (and optionally the library’s own holdings.) While these services tout themselves as improvements to federated search, the reality is that there is no alternative to federated search for a number of important applications.

WorldWideScience.org is a global gateway to science. The federated search application was conceived and developed at OSTI and hosted by us. The portal performs live federated search of 70 databases from 66 countries. Participating members provide access to their national research databases. For a number of reasons this important gateway to millions of research documents does not lend itself to the discovery service model.

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

19
Apr

State Farm Insurance Librarian Adam Bennington has a fun yet serious article in this month’s Searcher Magazine. The article, A Practical Guide to Coping With Reference Anxiety Disorder, brings hope to information professionals everywhere. Bennington explains the cause of RAD:

When the searcher can’t uncover the answer, feelings of guilt, shame, and doubt in his or her professional worth can grow acute, especially in newly minted information professionals.

Bennington asks us to consider that since “studies have been published on the pressure produced when penguins poop,” that your client’s reasonable sounding question can’t be new. There must be an answer somewhere and you should be able to find it.

While I read and write a lot about searching, my bias is that, given a large enough pile of federated and other search tools, every “reasonable” question must have an answer somewhere on the Web. I’ve never considered the existence of an alternate reality where intelligent questions don’t have answers.

Now here’s an interesting question that really does have an answer. If you’re a research professional, or even if you’re not, how do you know when it’s time to stop searching? I won’t give anything away but I’ll tell you that Bennington, in his article, gives six signs to look for that tell you you’re thrashing. Can you guess what they are, or can you come up with some of your own?

Leave comments here and, if you have a copy of this month’s Searcher, don’t give away the answers!

If you enjoyed this post, make sure you subscribe to the RSS feed!

15
Apr

Yes, this is an off-topic post. I’m entitled to do that occasionally.

This post in ars technica got my attention: Library of Congress: We’re archiving every tweet ever made.

Get ready for fame, tweeters of the world: the Library of Congress is archiving for posterity every public tweet made since the service went live back in 2006. Every. Single. Tweet.

The LOC announced the news, appropriately enough, on Twitter. Twitter isn’t just about being pretentious and notifying the world about the contents of your lunch (though it’s about those things too).

Wow!

Read the rest of this entry »

If you enjoyed this post, make sure you subscribe to the RSS feed!

9
Apr

I enjoyed this article in the “federated-search-is-not-dead” department. The article, Is ECM Going The Way Of The Dodo? Or Maybe The Way Of The Intranet?, ponders the future of Enterprise Content Management (ECM) systems. The article’s author, Sean Nicholson, cites a prediction:

1) Enterprise Content Management and Document Management will go their separate ways
ECM as a marketing and technical concept has great validity. But the idea of having a single overarching platform to manage all sources of content management only works well in those enterprises that follow a unified and services-oriented architectural approach to IT.

Nicholson argues that obstacles such as needing to pick a single vendor or comprehensive service-oriented architecture will drive many organizations away from an ECM solution. This bodes well for federated search systems that can dig into multiple databases and information systems and bring back relevant information. Nicholson further predicts that “federated search will become crucial to organizations that choose not to implement a structured ECM architecture.” And, he raises the question we should all be pondering:

Will better federated search technologies negate the need for a central repository?

I hope so.

If you enjoyed this post, make sure you subscribe to the RSS feed!