Montag, 28. Februar 2011

Book Review: Solr 1.4 Enterprise Search Server

I've been interested in Solr since I read about it the first time, must have been some time in 2008, doing some research for a search centric web page that was supposed to be run on OpenCms but unfortunately was never developed. At that time I wouldn't have used it as I hadn't heard about it before but I liked the idea a lot. After having attended the Devoxx university session by Eric Hatcher on Solr in 2009 I was completely sure that the next search system I would implement would be based on Solr. The project's nearly finished now, time to recap what I took out of the book I got for learning Solr.

First of all, when learning a new technology I prefer paper books over internet research. Though there are other books available, Solr 1.4 Enterprise Search Server by David Smiley and Eric Pugh seems to be the one that is most often recommended.

The book starts off with a high level introduction into what Solr and Lucene are, some first examples and interestingly, how to build Solr from source. Though the book was released before Solr 1.4 the authors seemed to have the foresight that some features might still be lacking and had to be included manually. In fact, I've never seen an open source project where applying patches is such a common thing as it seems to be the case for Solr.

Schema configuration and text analysis are the topics for the second chapter. It begins with an introduction into MusicBrainz, a freely available data set of music data is used as an example throughout the book. This chapter is crucial to the understanding of Solr as it introduces a lot of Lucene concepts that probably not every reader is familiar with.

After quite some theory chapter 3 starts with the practical parts, covering the indexing process. Curl, the command line http client, is used to send data to solr and retrieve it. Another option, the data import handler, that directly imports data from a database, is also introduced.

Chapter 4 to 6 walk the reader through the search process and several useful components to enhance the users search experience like faceting and the dismax request handler. This is the part where Solr really shines as you can see how easy it is to integrate new features in your application that probably would have taken a long time to develop using plain Lucene.

Deploying Solr is covered in Chapter 7 with quite some useful information on configuring and monitoring a Solr instance. Chapter 8 looks at some client APIs from different programmin languages, SolrJ being the most important to me. The book ends with an in-depth look at how Solr can be tunded and scaled.

I can say that this is a really excellent book, as an introduction to Solr as well as a reference while developing your application. The most common use cases are covered, the examples make it really easy to adopt the concepts in your application. There are lots of hands on information that prove useful during development and deployment of your application.

Some slight drawbacks I don't want to keep to myself: As the common message format for Solr is a custom XML dialect, there is a lot of XML in the book to digest. As it's so common to use it that's not necessarily a bad thing but you might get quite dizzy looking at a lot of angle brackets. From a readers perspective some variety would have been nice e.g. by mixing XML with the Ruby format or JSON or introducing client APIs earlier. Also, while it's a good idea to use a data set that is freely available, MusicBrainz probably isn't the best format for demoing some features. There are no large text sections or documents, which are often what a search application will be build on. And finally, not really an issue of the authors but rather of the publisher, PacktPub: When skimming through the book it's quite hard to see when a new section begins. The headlines do not contain a numbering scheme and are of a very similar size.

Nevertheless, if you have to develop an application using Solr, you should by all means buy this book, you won't regret it.