Daniel Tunkelang, co-founder and Chief Scientist at Endeca, has written a book on faceted search. The book should be available for purchase sometime in June. A little bit of pre-release information is available at Daniel’s blog.
Daniel was kind enough to send me a near-final draft of the manuscript to review.
If you don’t know what faceted search is or what its relevance is to federated search you may find this article to be helpful. Daniel Tunkelang is very passionate about finding ways to make information retrieval a more interactive endeavor. Faceted search is an important component of this interactivity. Here’s a salient piece of Daniel’s biography which I borrowed from his book:
Daniel is recognized as a leading advocate of Human Computer Information Retrieval, a multidisciplinary effort to bridge the gap between the more systems-oriented work in information retrieval and the more cognitively focused approach in library and information science. He has organized annual workshops on the subject. He publishes The Noisy Channel, a widely read and cited blog on the information seeking process. He also participates actively in both academic and industry conferences, recently attempting to bridge the gap between the two by organizing an Industry Track at SIGIR, the leading academic conference on information retrievals.
I was delighted to read Daniel’s book for a couple of reasons. First and foremost, I needed some serious education on the subject. I knew that faceted search helped create “smart menus” but I didn’t know much beyond that. The second reason I enjoyed the book was that Daniel, someone I’ve had some interaction with over the past few months, is a very sharp and very scientifically oriented person yet he wrote a book that non-scientists could understand. Yeah! A third thrill for me is that I may someday write a book about federated search and Daniel does a great job of modeling how to write a book about technology that draws the reader in and doesn’t overwhelm him or her.
Onto the book.
My near-final draft has 108 pages. That includes eight chapters, a glossary, and a beefy section of well over 100 references. The book is divided into three parts as follows:
PART 1: Key Concepts
1. Introduction: What are Facets?
2. Information Retrieval
3. Faceted Information Retrieval
PART 2: Research and Practice
4. Academic Research
5. Commercial Applications
PART 3: Practical Concerns
6. Back-End Concerns
7. Front-End Concerns
Part 1 introduces facets in a historical context. Part 2 discusses the research on facets and describes a number of applications. Part 3 is for developers; it dives into the challenges of implementing faceted search systems. Each chapter has a brief summary of key points.
Now, onto the chapters.
Chapter 1 – Introduction: What are Facets?
Chapter 1 answers a number of questions that help us to understand the major approaches to representing information. Here are the major questions: What are facets? What are classifications? What is a taxonomy? What are ontologies? How does the evolving Semantic Web relate to these concepts? What did Aristotle have to do with all of this?
Chapter 2 – Information Retrieval
If you’ve ever wanted a crash course in information retrieval — in less than 15 pages — then look no further. This chapter builds on the previous one. Once information is represented and stored in some structured way then it needs to be retrievable. The concepts of relevance, precision and recall are covered. The set retrieval model is introduced. We learn about directory-based navigation as an alternative to free-form text searching and about the serious limitations of both approaches.
Chapter 3 – Faceted Information Retrieval
This chapter dives into some tricky territory. It discusses parametric search (where one performs Boolean searches over a menu of fixed terms), faceted navigation as a way to overcome some of the problems with parametric search, and it introduces faceted search as a way to combine faceted navigation with text search.
Chapter 4 – Academic Research
This chapter surveys important academic research on faceted search. The survey includes a number of fun screenshots. For me all of the material was new. I had never heard of FilmFinder, HIBROWSE, the Flamenco Project, mSpace, or Parallax. Each of these early faceted search experiments provides insights into how the field has evolved over the past decade.
Chapter 5 – Commercial Applications
More great pictures. This chapter samples a few current examples of faceted search at eBay, Amazon, and CNET Shopping. This chapter removes any doubt that anyone might have had that faceted search is just a dry academic subject. Vendors such as Endeca have helped to popularize faceted search by making it a core feature of major ecommerce and search sites. It’s also worth noting that Solr (based on Lucene) and Drupal have turned faceted search into a commodity feature, one that people can implement without needing to be faceted search gurus.
Chapter 6 – Back-End Concerns
I have to admit that I only skimmed this chapter. I’m not planning to implement faceted search anytime soon but if you are then please read this chapter carefully. While readers of Daniel’s book may do just fine rolling their own faceted search solution with Solr or Drupal they’ll need to be aware of issues of scalability, efficiency, and giving the user too much information, i.e. too many facets. The problem of how to enrich unstructured data so that facets can be created from it is also discussed.
Chapter 7 – Front-End Concerns
This is another very important chapter for developers. It addresses the challenge of incorporating faceted search into an application in such a way as to not overwhelm or confuse the user. The chapter addresses these (and other) questions. Where and when should the application present facets? How should it organize facets and their values? How should it integrate the search box into faceted search? How should it incorporate Boolean logic into faceted searching? There are many important considerations and without absorbing the guidelines in this chapter a developer could easily create a faceted search monster.
Chapter 8 – Conclusion
This is a short chapter. It briefly ponders the fascinating question of whether faceted search will ever be applied to the open web. Daniel sees the efforts of Kosmix and Cuil as important first steps in addressing the challenges in getting from here to there. This is the one chapter of the book leaving me wanting more. Perhaps this is a good thing. Perhaps it will motivate me to read more about the subject. But, I would still love to hear more from Mr. Tunkelang on what the future might look like.
In summary, I liked this book a lot. It taught me quite a bit in a very conversational style. The book is, of course, a very good marketing piece for Endeca. I would guess that many of those who read the book and want faceted search for their applications will quickly realize how difficult it is to do faceted search well and they will decide to hire Endeca or another firm to build it for them. I see this book as a great win-win. It provides a good education to readers and it introduces Endeca to them as a solution provider. Nice!