Mittwoch, 4. September 2013

Developing with CoreMedia

A while ago I had the chance to attend a training on web development with CoreMedia. It's a quite enterprisey commercial Content Management System that powers large corporate websites like telekom.com as well as news sites like Bild.de (well, you can't hold CoreMedia responsible for the kind of "content" people put into their system). As I have been working with different Java based Content Management Systems over the years I was really looking forward to learn about the system I heard really good things about. In this post I'll describe the basic structure of the system as well how it feels like to develop with it.

System Architecture

As CoreMedia is built to scale to really large sites the architecture is also built around redundant and distributed components. The part of the system the editors are working on is seperated from the parts that serve the content to the internet audience. A publication process copies the content from the editorial system to the live system.

The heart of CoreMedia is the Content Server. It stores all the content in a database and makes it retrievable. You rarely access it directly but only via other applications that then talk to it in the background via CORBA. Editors used to work with CoreMedia using a Java client (used to be called the Editor, now known as the Site Manager), starting with CoreMedia 7 there is also the web based Studio that is used to create and edit content. A preview application can be used to see how the site looks before being published. Workflows, that are managed using the Workflow Server, can be used to control the processes around editing as well as publication.

The live system consists of several components that are mostly laid out in a redundant way. There is one Master Live Server as well as 0 to n Replication Live Servers that are used for distributing the load as well as fault tolerance. The Content Management Servers are accessed from the Content Application Engine (CAE) that contains all the delivery and additional logic for your website. One or more Solr instances are used to provide the search services for your application.

Document Model

The document model for your application describes the content types that are available in the system. CoreMedia provides a blueprint application that contains a generic document model that can be used as a basis for your application but you are also free to build something completely different. The document model is used throughout the whole system as it describes the way your content is stored. The model is object oriented in nature with documents that consist of attributes. There are 6 attribute types like String (fixed length Strings), XML (variable length Strings) and Blob (binary data) available that form the basis of all your types. An XML configuration file is used to describe your specific document model. This is an example of an article that contains a title, the text and a list of related articles.

<DocType Name="Article">
  <StringProperty Name="title"/>
  <XmlProperty Grammar="coremedia-richtext-1.0" Name="text"/>
  <LinkListProperty LinkType="Article" Name="related"/>
</DocType>

Content Application Engine

Most of the code you will be writing is the delivery code that is part of the Content Application Engine, either for preview or for the live site. This is a standard Java webapp that is assembled from different Maven based modules. CAE code is heavily based on Spring MVC with the CoreMedia specific View Dispatcher that takes care of the rendering of different documents. The document model is made available using the so called Contentbeans that can be generated from the document model. Contentbeans access the content on demand and can contain additional business logic. So those are no POJOs but more active objects similar to Active Record entities in the Rails world.

Our example above would translate to a Contentbean with getters for the title (a java.lang.String), the text (a com.coremedia.xml.Markup) and a getter for a java.util.List that is typed to de.fhopf.Article.

Rendering of the Contentbeans happens in JSPs that are named according to classes or interfaces with a specific logic to determine which JSP should be used. An object Article that resides in the package de.fhopf would then be found in the path de/fhopf/Article.jsp, if you want to add a special rendering mechanism for List this would be in java/util/List.jsp. Different rendering of objects can be done by using a view name. An Article that is rendered as a link would then be in de/fhopf/Artilcle.link.jsp.

This is done using one of the custom Spring components of CoreMedia, the View Dispatcher, a View Resolver that determines the correct view to be invoked for a certain model based on the content element in the Model. The JSP that is used can then contain further includes on other elements of the content, be it documents in the sense of CoreMedia or one of the attributes that are available. Those includes are again routed through the View Dispatcher.

Let's see an example for rendering the list of related articles for an article. Say you call the CAE with a certain content id, that is an Article. The standard mechanism routes this request to the Article.jsp described above. It might contain the following fragment to include the related articles:

<cm:include self="${self.related}"/>

Note that we do not tell which JSP to include. CoreMedia automatically figures out that we are including a List, for example a java.util.ArrayList. As there is no JSP available at java/util/ArrayList.jsp Coremedia will automatically look for any interfaces that are implemented by that class, in this case it will find java/util/List.jsp. This could then contain the following fragment:

<ul>
<c:forEach items="${self}" var="item">
  <li><cm:include self="${item}" view="link"></li>
</c:forEach>
</ul>

As the List in our case contains Article implementations this will then hit the Article.link.jsp that would finally render the link. This is a very flexible approach with a high degree of reusability for the fragments. The List.jsp we are seeing above has no connection to the Article. You can use it for any objects that should be rendered in a List structure, the View Dispatcher of CoreMedia takes care of which JSP to include for a certain type.

To minimize the load on the Content Server you can also add caching via configuration settings. Data Views, that are a layer on top of the Contentbeans, are then held in memory and contain prefilled beans that don't need to access the Content Management Server anymore. This object cache approach is different to the html fragment caching a lot of other systems are doing.

Summary

Though this is only a very short introduction you should have seen that CoreMedia really is a nice system to work with. The distributed nature not only makes it scalable but this also has implications when developing for it: When you are working on the CAE you are only changing code in this component. You can start the more heavyweight Contentserver only once and afterwards work with the lightweight CAE that can be run using the Maven jetty plugin. Restarts don't take a long time so you have short turnaround times. The JSPs are very cleanly structured and don't need to include scriptlets (I heard that this has been different for earlier versions). As most of the application is build around Spring MVC you can use a lot of knowledge that is around already.

Mittwoch, 28. August 2013

FrOSCon 8 2013 - Free and Open Source Software Conference

Last weekend I attended FrOSCon, the Free and Open Source Software Conference taking place in St. Augustin near Bonn. It's a community organized conference with an especially low entrance fee and a relaxed vibe. The talks are a good mixture of development and system administration topics.

Some of the interesting talks I attended:

Fixing Legacy Code by Kore Nordmann and Benjamin Eberlein

Though this session was part of the PHP track it contained a lot of valuable information related to working with legacy code in any language. Besides strategies for getting an application under test the speakers showed some useful refactorings that can make sense to start with. Slides

Building Awesome Ruby Command Line Apps by Christian Vervoorts

Christian first showed some of the properties that make up a good command line app. You should choose sane default values but make those configurable. Help functionality is crucial for a good user experience, via -h parameter and a man page. In the second part Chistian introduced some Ruby gems that can be used to build command line apps. GLI seems to be the most interesting with a nice DSL and its scaffolding functionality.

Talking People Into Creating Patches by Isabel Drost-Fromm

Isabel, who is very active in the Apache community, introduced some of her findings when trying to make students, researchers and professionals participate in Open Source. The participants where a mixture of people running open source projects and developers that are interested in contributing to open source. I have been especially interested in this talk because I wouldn't mind having more people help with the Odftoolkit I am also working on. When working with professionals, who are the main target, it is important to answer quickly on mails or issues as they might move on to other projects and might not be able to help later on. Also, it's nice to have some easy tasks in the bugtracker that can be processed by newbies.

MySQL Performance Schema by Carsten Thalheimer

Performance Schema is a new feature in MySQL 5.5 and is activated by default since 5.6. It monitors a lot of the internal functionality like file access and queries so you can later see which parts you can optimize. Some performance measurements done by the MySQL developers showed that keeping it activated has an performance impact of around 5%. Though this doesn't sound that good at first I think you can gain a lot more performance by the insight you have in the inner workings. Working with Performance Schema is supposed to be rather complex ("Take two weeks to work with it"), ps_helper is a more beginner friendly functionality that can get you started with some useful metrics.

Summary

FrOSCon is the one of the most relaxing conferences I know. It is my goto place for seeing stuff that is not directly related to Java development. The low fee makes it a no brainer to attend. If you are interested in any of this years talks they will also be made available online.

Mittwoch, 21. August 2013

Getting Started with ElasticSearch: Part 2 - Querying

This is the second part of the article on things I learned while building a simple Java based search application on top of ElasticSearch. In the first part of this article we looked at how to index data in ElasticSearch and what the mapping is. Though ElasticSearch is often called schema free specifying the mapping is still a crucial part of creating a search application. This time we will look at the query side and see how we can get our indexed talks out of it again.

Simple Search

Recall that our documents consist of a title, the date and the speaker of a talk. We have adjusted the mapping so that for the title we are using the German analyzer that stems our terms and we can search on variations of words. This curl request creates a similar index:

curl -XPUT "http://localhost:9200/blog" -d'
{
    "mappings" : {
        "talk" : {
            "properties" : {
                "title" : { "type" : "string", "store" : "yes", "analyzer" : "german" }
            }
        }
    }
}'

Let's see how we can search on our content. We are indexing another document with a German title.

curl -XPOST "http://localhost:9200/blog/talk/" -d'
{
    "speaker" : "Florian Hopf",
    "date" : "2012-07-04T19:15:00",
    "title" : "Suchen und Finden mit Lucene und Solr"
}'

All searching is done on the _search endpoint that is available on the type or index level (you can also search on multiple types and indexes by separating them with a comma). As the title field uses the German analyzer we can search on variations of the words, e.g. suche which stems to the same root as suchen, such.

curl -XGET "http://localhost:9200/blog/talk/_search?q=title:suche&pretty=true"                                                                       
{
  "took" : 14, 
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
},                                                                                                                                                                                                                             
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "talk",
      "_id" : "A2Qv3fN3TkeYEhxA4zicgw",
      "_score" : 0.15342641, "_source" : {
        "speaker" : "Florian Hopf",
        "date" : "2012-07-04T19:15:00",
        "title" : "Suchen und Finden mit Lucene und Solr"
      }
    } ]
  }

The _all field

Now that this works, we might want to search on multiple fields. ElasticSearch provides the convenience functionality of copying all field content to the _all field that is used when omitting the field name in the query. Let's try the query again:

curl -XGET "http://localhost:9200/blog/talk/_search?q=suche&pretty=true"
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }

No results. Why is that? Of course we have set the analyzer correctly for the title as we have seen above. But this doesn't mean that the content is analyzed in the same way for the _all field. As we didn't specify an analyzer for this field it still uses the StandardAnalyzer that splits on whitespace but doesn't do any stemming. If you want to have a consistent behavior for the title and the _all field you need to set the analyzer in the mapping:

curl -XPUT "http://localhost:9200/blog/talk/_mapping" -d'
{
    "mappings" : {
        "talk" : {
            "_all" : {"analyzer" : "german"},
            "properties" : {
                "title" : { "type" : "string", "store" : "yes", "analyzer" : "german" }
            }
        }
    }
}'

Note that as with all mapping changes you can't change the type of the _all field once it's created. You need to delete the index, put the new mapping and reindex your data. Afterwards our search will return the same results for the two queries.

_source

You might have noticed from the example above that ElasticSearch returns the special _source field for each result. This is very convenient as you don't need to specify which fields should be stored. But be aware that this might become a problem for large fields that you don't need for each search request (content section of articles, images that you might store in the index). You can either disable the use of the source field and indicate which fields should be stored in the mapping for your indexed type or you can specify in the query which fields you'd like to retrieve:

curl -XGET "http://localhost:9200/blog/talk/_search?q=suche&pretty=true&fields=speaker,title"
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "talk",
      "_id" : "MA2oYAqnTdqJhbjnCNq2zA",
      "_score" : 0.15342641
    }, {
      "_index" : "blog",
      "_type" : "talk",
      "_id" : "aGdDy24cSImz6DVNSQ5iwA",
      "_score" : 0.076713204,
      "fields" : {
        "speaker" : "Florian Hopf",
        "title" : "Suchen und Finden mit Lucene und Solr"
      }
    } ]
  }

The same can be done if you are not using the simple query parameters but the more advanced query DSL:

curl -XPOST "http://localhost:9200/blog/talk/_search" -d'
{
    "fields" : ["title", "speaker"],
    "query" : {
        "term" : { "speaker" : "florian" }
    }
}'

Querying from Java

Besides the JSON based Query DSL you can also query ElasticSearch using Java. The default ElasticSearch Java client provides builders for creating different parts of the query that can then be combinded. For example if you'd like to query on two fields using the multi_match query this is what it looks like using curl:

curl -XPOST "http://localhost:9200/blog/_search" -d'
{
    "query" : {
        "multi_match" : {
            "query" : "Solr",
            "fields" : [ "title", "speaker" ]
        }
    }
}'

The Java version maps quite well to this. Once you found the builders you need you can use the excellent documentation of the Query DSL for your Java client as well.

QueryBuilder multiMatch = multiMatchQuery("Solr", "title", "speakers");
SearchResponse response = esClient.prepareSearch("blog")
        .setQuery(multiMatch)
        .execute().actionGet();
assertEquals(1, response.getHits().getTotalHits());
SearchHit hit = response.getHits().getAt(0);
assertEquals("Suchen und Finden mit Lucene und Solr", hit.getSource().get("title"));

The same QueryBuilder we are constructing above can also be used on other parts of the query: For example it can be passed as a parameter to create a QueryFilterBuilder or can be used to construct a QueryFacetBuilder. This composition is a very powerful way to build flexible applications. It is easier to reason about the components of the query and you could even test parts of the query on its own.

Faceting

One of the most prominent features of ElasticSearch is its excellent faceting support that not only is used for building search applications but also for doing analytics of large data sets. You can use different kinds of faceting, e.g. for certain terms, using the TermsFacet, or for queries, using the query facet. The query facet would accept the same QueryBuilder that we used above.

TermsFacetBuilder facet = termsFacet("speaker").field("speaker");
QueryBuilder query = queryString("solr");
SearchResponse response = esClient.prepareSearch("blog")
        .addFacet(facet)
        .setQuery(query)
        .execute().actionGet();
assertEquals(1, response.getHits().getTotalHits());
SearchHit hit = response.getHits().getAt(0);
assertEquals("Suchen und Finden mit Lucene und Solr", hit.getSource().get("title"));
TermsFacet resultFacet = response.getFacets().facet(TermsFacet.class, "speaker");
assertEquals(1, resultFacet.getEntries().size());

Conclusion

ElasticSearch has a really nice Java API, be it for indexing or for querying. You can get started with indexing and searching in no time though you need to know some concepts or the results might not be what you expect.

Mittwoch, 14. August 2013

The Pragmatic Programmers Rubber Duck of the 19th Century

In their influential book "The Pragmatic Programmer" Andy Hunt and Dave Thomas describe a technique for finding solutions to hard problems you are struggling with. They recommend to just tell the problem to somebody, not for getting an answer, but because while explaining the problem you are thinking differently about it. And, if there is nobody around you, get yourself a rubber duck you can talk to, hence the name of the tip.

It's obvious that this is not a new discovery made by the authors. Everybody has experienced similar situations where they are finding a solution to a problem while explaining it to someone. But I have been surprised to read this in the essay "Über die allmähliche Verfertigung der Gedanken beim Reden" by Heinrich von Kleist dating to 1805/1806 (translated from German by me):

If you want to know something and you can't find it in meditation I advice you [...] to tell it to the next acquaintance you are meeting. He doesn't need to be a keen thinker and I don't mean you should ask him: no! Rather you should tell him about it in the first place.

1806. The same tip (without the duck). This is another case where we are relearning things that have been discovered before, something especially computer science is prone to.

So, what is the big achievement of the authors? It's not that they are finding only new ideas. We don't need that many new ideas. There is a lot of stuff around that is just waiting to be applied to our work as software developers. Those old ideas need to be put into context. There is even a benefit in stating obvious things that might trigger rethinking your habits.

Sonntag, 11. August 2013

SoCraTes 2013 - Two Days of Open Space

Last week I attended SoCraTes, an international Open Space conference on Software Craftmanship and Testing. As a few days have passed now I'd like to recap what sessions I attended and what was discussed. Though this is only a minor part of the sessions that have been discussed you should get a good grasp on what kind of topics are shared during the two days.

General

The conference is located at Seminarzentrum Rückersbach which is close to nowhere (Aschaffenburg). As there is nothing around you where people would normally go during conferences (pubs) everybody is staying on the premise for full 48h. You sleep there, you eat there, you spend your evening there. Generally you are around the same people from thursday evening to saturday evening which leads to a lot of interesting talks and evening activities.

Open Space format

Some of the time is needed for the framework of the Open Space. On thursday evening there is a world cafe, where you spend time on different tables with different people to discuss what you expect from the conference and what you would like to learn.

Every day starts with the planning of the day, the marketplace. Every participant can propose talks, discussions and hands on sessions and put those on the schedule. There are several rooms available and, if the weather permits, there is also plenty of space outside for discussions. The day then is dedicated to the sessions, some of which I will describe below. In the evening there is some kind of retrospective of the day. Also you can propose evening activities, which can range from discussions, coding and board games.

The sessions I will describe now are only a snapshot of what has been available. As there are so many good things it's sometimes hard to decide which session to attend.

Day 1

Agenda for day 1, photo taken by Robert Hostlowsky, who also blogged about SoCraTes over at the Codecentric blog.

Continuous Delivery

Sergey, who did a lot of sessions for the event, discussed a problem he is facing when trying to implement Continuous Delivery. One of the basic concepts of CD is that you should be using the same artifact for all of your stages, that means an artifact should only be build once and deployed to the different systems for automated testing, user testing and finally deployment. When an artifact is promoted to one of the later stages you would like to update the version so it is obvious that it is a real release candidate. Depending on the technology you are using it might not be that easy to update the version without building the artifact again.

Integrated Tests are a scam

This session, inspired by this article by J.B. Rainsberger, mostly revolved around the problems of testing database heavy applications, where some business logic tends to be contained in the database e.g. by means of OR mapping configurations. During the discussion it became obvious that the term integration test is too overloaded: A lot of people are thinking of integrating external systems like databases whereas others think of it as integrating some of your components. I learned about Hexagonal Architecture which I didn't know as a term before.

Mapping Personal Practices

Markus Gärtner hosted a laid back outdoor session on determining in which areas of your professional life you would like to improve. First we collected all the stuff we are doing daily on a mind map and discussed them. Next we determined which parts we would like to improve in during the next months. I have done similar things for myself before but it was interesting to see what other people are working on and what they are interested in.

Specification by Example Experience Report

Nicole introduced some of the concepts of specification by example using the hot dog point of sale that was also used for the architecture kata during last years SoCraTes. She described some of the experiences they had when introducing it in their company (which doesn't sell hot dogs as far as I know). Specification by Example and BDD had been huge topics during last years conference and I would really like to try it once and see if it really improves communication. The (german) slides she used as an introduction are also available online.

Designing Functional Programs

A few people gathered to discuss the implications functional programming has when designing an application, e.g. when to use higher level functions. Unfortunately nobody was around who had already implemented an application in a functional way. To get some experience we tried to do the Mars Rover kata in Java Script. The kata probably was not the ideal choice as it is rather stateful and therefore a more advanced problem.

Productivity Porn

A discussion on self management and everything around it. People shared some of their practices and Jan showed his foldable Personal Kanban board. It's a fact that you can really spend a lot of time thinking about productivity which sometime is not the most productive thing. But Personal Kanban seems to help a lot of people so I am planning to read the recommended book about it and try it for myself.

Day 2

Agenda for day 2, again taken by Robert.

VIM Show and Tell

Sebastian proposed a session on VIM where everybody should show their favourite plugin or feature. I am still a novice VIM user so I mainly learned some of the basics. Quite some people there are using VIM for their development work, mostly with dynamic languages I guess. This video on InfoQ has been recommended and after watching it, I am also recommending it here.

Monads

Another talk by Nicole where she introduced Monads, starting with an example in Java and moving on to Haskell. I have a better idea now on what Monads could be but the concept still is too abstract for me.

Async Patterns

Sergey presented three alternative solutions for building concurrent solutions:

  • Actors, a core feature of Erlang and Akka
  • Reactive Extensions
  • Communicating Sequential Processes as implemented in Clojure

SOLID Principles

Another session by Sebastian where we discussed the SOLID principles, considered to be the basics of good object oriented design. It was interesting to see that though you think you know the concepts it is still difficult to define them. While looking at some examples it also became obvious that sometimes you might follow one principle while violating another. Unfortunately I couldn't stay for the second, practical part of the session.

Quit your Job

During the world cafe Daniel Temme mentioned that he had given a talk last year on quitting your job and quit his job just again. He told me parts of the story in the evening but I was glad that he decided to give the talk again. Though it is rather provocative the story behind it is important and spans a lot of areas of your life: Sometimes you are caught in your habits and don't really notice that the right thing to do would be something else. Daniel is currently on a journey where he visits company and works for food and accomodation.

Last words

SoCraTes was really awesome again. Thanks to all the organizers and participants that shared a lot. I'll definitely try to be there again next year.

Mittwoch, 31. Juli 2013

GETting Results from ElasticSearch

ElasticSearch exposes most of its functionality via a RESTful API. When it comes to querying the data, you can either pass request parameters, e.g. the query string, or use the query DSL which structures the queries as JSON objects. For a talk I gave I used this example which executes a GET request with curl and passes the JSON query structure in the request body.

curl -XGET 'http://localhost:9200/jug/talk/_search' -d '{
    "query" : {
        "query_string" : {"query" : "suche"} 
    },
    "facets" : {
        "tags" : {
            "terms" : {"field" : "speaker"} 
        }
    }
}'

An alert listener(*) later told me that you can't use a request body with a GET request. While preparing the talk I also thought about this and only added it after testing it successfully. At least it is unusual and some software like proxy caches might not handle your request as intended. A lot of ElasticSearch examples I have seen are using POST requests instead but I think semantically requesting search results should be a GET.

The ElasticSearch docs explicitly allow the use of GET requests with a request body:

"Both HTTP GET and HTTP POST can be used to execute search with body. Since not all clients support GET with body, POST is allowed as well."

I didn't find a hint in the HTTP specification whether this should be allowed or not. This answer on Stackoverflow goes in the same direction as my initial concern, that you shouldn't do it because it might not be expected by users and also not supported by some stacks.

Finally, in this message Roy Fielding, one of the authors of the HTTP specification, discourages the use of a request body with GET.

"... any HTTP request message is allowed to contain a message body, and thus must parse messages with that in mind. Server semantics for GET, however, are restricted such that a body, if any, has no semantic meaning to the request."

As the query DSL influences the response it is clear that this is a semantic meaning, which by the words of Roy Fielding shouldn't be the done. So no consensus exists on this topic. By hindsight I am quite surprised that this works out that well with ElasticSearch and there aren't any problems I have heard about.

(*) I wish he didn't tell me, while talking about another talk, that he always looks for mistakes in slides when he's bored ;).

Mittwoch, 5. Juni 2013

Book Review: Hibernate Search by Example

PacktPub kindly offered me a free review edition of Hibernate Search by Example. Though I've used Lucene and Hibernate independently on a lot of projects I've never used Hibernate Search which builds on both technologies. That makes me a good candidate for reviewing an introductory book on it.

The Project

Hibernate Search is a really interesting technology. It transparently indexes Hibernate entities in Lucene and provides a slick DSL for querying the index. You decide which entities and which fields to index by adding annotations on class and field level. Custom analyzer chains can be defined for the entities and referenced from fields. Each entity is written to its own index but it can also include data from related or embedded entities. By default, Lucene is only used for querying and ranking, the result list is still populated from the database. If this is not enough for your application you can also use projection to use stored fields in Lucene for result display.

The Book

Steve Perkins, the author of Hibernate Search by Example, did a great job in designing an example that can evolve with the book. It starts very simple, also explaining the build setup using Maven and an embedded Jetty instance with a H2 database. Each following chapter builds on the results of the previous chapter and enhances the project with different features. Each chapter is dedicated to a certain topic that is immediately used in the application so you can see its benefit. This way you are learning different aspects, from mapping entities and performing queries to advanced mapping aspects, analyzing, filtering, using facets and even setting up master slave systems and sharding your data. But not only is the book structured in a good way, the author also has a very clear writing. Combined with the practical examples this makes it very easy to read. If you're planning to implement a solution on top of Hibernate Search you're well advised to read this book.

My Impression of the Technology

As I am building quite a lot of search applications I'd like to add some impressions of Hibernate Search. Though it's a very interesting technology I think you should be careful when deciding on its use. Not only will you tie yourself to Hibernate as your JPA provider but there are also some implications on the search side. Hibernate Search offers advanced features like faceting, it can be distributed and sharded. But there might be features that you want to build later on that would be far more easy with a search server like Solr or ElasticSearch. Hibernate Search uses some components of Solr but a real integration (using Solr as a backend) is rather difficult I guess. Solr needs in schema configured in a file so you would need to duplicate it. ElasticSearch could be a far better candidate as its schema mapping can be created with its REST API. I am really curious if somebody has been thinking about starting an implementation of Hibernate Search on top of ElasticSearch. With the Lucene implementation that is described in the book you can easily enhance your database driven application with advanced search functionality. But be aware that future requirements might be more difficult to build compared to integrating a search server from the beginning.