Mittwoch, 14. August 2013

The Pragmatic Programmers Rubber Duck of the 19th Century

In their influential book "The Pragmatic Programmer" Andy Hunt and Dave Thomas describe a technique for finding solutions to hard problems you are struggling with. They recommend to just tell the problem to somebody, not for getting an answer, but because while explaining the problem you are thinking differently about it. And, if there is nobody around you, get yourself a rubber duck you can talk to, hence the name of the tip.

It's obvious that this is not a new discovery made by the authors. Everybody has experienced similar situations where they are finding a solution to a problem while explaining it to someone. But I have been surprised to read this in the essay "Über die allmähliche Verfertigung der Gedanken beim Reden" by Heinrich von Kleist dating to 1805/1806 (translated from German by me):

If you want to know something and you can't find it in meditation I advice you [...] to tell it to the next acquaintance you are meeting. He doesn't need to be a keen thinker and I don't mean you should ask him: no! Rather you should tell him about it in the first place.

1806. The same tip (without the duck). This is another case where we are relearning things that have been discovered before, something especially computer science is prone to.

So, what is the big achievement of the authors? It's not that they are finding only new ideas. We don't need that many new ideas. There is a lot of stuff around that is just waiting to be applied to our work as software developers. Those old ideas need to be put into context. There is even a benefit in stating obvious things that might trigger rethinking your habits.

Sonntag, 11. August 2013

SoCraTes 2013 - Two Days of Open Space

Last week I attended SoCraTes, an international Open Space conference on Software Craftmanship and Testing. As a few days have passed now I'd like to recap what sessions I attended and what was discussed. Though this is only a minor part of the sessions that have been discussed you should get a good grasp on what kind of topics are shared during the two days.

General

The conference is located at Seminarzentrum Rückersbach which is close to nowhere (Aschaffenburg). As there is nothing around you where people would normally go during conferences (pubs) everybody is staying on the premise for full 48h. You sleep there, you eat there, you spend your evening there. Generally you are around the same people from thursday evening to saturday evening which leads to a lot of interesting talks and evening activities.

Open Space format

Some of the time is needed for the framework of the Open Space. On thursday evening there is a world cafe, where you spend time on different tables with different people to discuss what you expect from the conference and what you would like to learn.

Every day starts with the planning of the day, the marketplace. Every participant can propose talks, discussions and hands on sessions and put those on the schedule. There are several rooms available and, if the weather permits, there is also plenty of space outside for discussions. The day then is dedicated to the sessions, some of which I will describe below. In the evening there is some kind of retrospective of the day. Also you can propose evening activities, which can range from discussions, coding and board games.

The sessions I will describe now are only a snapshot of what has been available. As there are so many good things it's sometimes hard to decide which session to attend.

Day 1

Agenda for day 1, photo taken by Robert Hostlowsky, who also blogged about SoCraTes over at the Codecentric blog.

Continuous Delivery

Sergey, who did a lot of sessions for the event, discussed a problem he is facing when trying to implement Continuous Delivery. One of the basic concepts of CD is that you should be using the same artifact for all of your stages, that means an artifact should only be build once and deployed to the different systems for automated testing, user testing and finally deployment. When an artifact is promoted to one of the later stages you would like to update the version so it is obvious that it is a real release candidate. Depending on the technology you are using it might not be that easy to update the version without building the artifact again.

Integrated Tests are a scam

This session, inspired by this article by J.B. Rainsberger, mostly revolved around the problems of testing database heavy applications, where some business logic tends to be contained in the database e.g. by means of OR mapping configurations. During the discussion it became obvious that the term integration test is too overloaded: A lot of people are thinking of integrating external systems like databases whereas others think of it as integrating some of your components. I learned about Hexagonal Architecture which I didn't know as a term before.

Mapping Personal Practices

Markus Gärtner hosted a laid back outdoor session on determining in which areas of your professional life you would like to improve. First we collected all the stuff we are doing daily on a mind map and discussed them. Next we determined which parts we would like to improve in during the next months. I have done similar things for myself before but it was interesting to see what other people are working on and what they are interested in.

Specification by Example Experience Report

Nicole introduced some of the concepts of specification by example using the hot dog point of sale that was also used for the architecture kata during last years SoCraTes. She described some of the experiences they had when introducing it in their company (which doesn't sell hot dogs as far as I know). Specification by Example and BDD had been huge topics during last years conference and I would really like to try it once and see if it really improves communication. The (german) slides she used as an introduction are also available online.

Designing Functional Programs

A few people gathered to discuss the implications functional programming has when designing an application, e.g. when to use higher level functions. Unfortunately nobody was around who had already implemented an application in a functional way. To get some experience we tried to do the Mars Rover kata in Java Script. The kata probably was not the ideal choice as it is rather stateful and therefore a more advanced problem.

Productivity Porn

A discussion on self management and everything around it. People shared some of their practices and Jan showed his foldable Personal Kanban board. It's a fact that you can really spend a lot of time thinking about productivity which sometime is not the most productive thing. But Personal Kanban seems to help a lot of people so I am planning to read the recommended book about it and try it for myself.

Day 2

Agenda for day 2, again taken by Robert.

VIM Show and Tell

Sebastian proposed a session on VIM where everybody should show their favourite plugin or feature. I am still a novice VIM user so I mainly learned some of the basics. Quite some people there are using VIM for their development work, mostly with dynamic languages I guess. This video on InfoQ has been recommended and after watching it, I am also recommending it here.

Monads

Another talk by Nicole where she introduced Monads, starting with an example in Java and moving on to Haskell. I have a better idea now on what Monads could be but the concept still is too abstract for me.

Async Patterns

Sergey presented three alternative solutions for building concurrent solutions:

  • Actors, a core feature of Erlang and Akka
  • Reactive Extensions
  • Communicating Sequential Processes as implemented in Clojure

SOLID Principles

Another session by Sebastian where we discussed the SOLID principles, considered to be the basics of good object oriented design. It was interesting to see that though you think you know the concepts it is still difficult to define them. While looking at some examples it also became obvious that sometimes you might follow one principle while violating another. Unfortunately I couldn't stay for the second, practical part of the session.

Quit your Job

During the world cafe Daniel Temme mentioned that he had given a talk last year on quitting your job and quit his job just again. He told me parts of the story in the evening but I was glad that he decided to give the talk again. Though it is rather provocative the story behind it is important and spans a lot of areas of your life: Sometimes you are caught in your habits and don't really notice that the right thing to do would be something else. Daniel is currently on a journey where he visits company and works for food and accomodation.

Last words

SoCraTes was really awesome again. Thanks to all the organizers and participants that shared a lot. I'll definitely try to be there again next year.

Mittwoch, 31. Juli 2013

GETting Results from ElasticSearch

ElasticSearch exposes most of its functionality via a RESTful API. When it comes to querying the data, you can either pass request parameters, e.g. the query string, or use the query DSL which structures the queries as JSON objects. For a talk I gave I used this example which executes a GET request with curl and passes the JSON query structure in the request body.

curl -XGET 'http://localhost:9200/jug/talk/_search' -d '{
    "query" : {
        "query_string" : {"query" : "suche"} 
    },
    "facets" : {
        "tags" : {
            "terms" : {"field" : "speaker"} 
        }
    }
}'

An alert listener(*) later told me that you can't use a request body with a GET request. While preparing the talk I also thought about this and only added it after testing it successfully. At least it is unusual and some software like proxy caches might not handle your request as intended. A lot of ElasticSearch examples I have seen are using POST requests instead but I think semantically requesting search results should be a GET.

The ElasticSearch docs explicitly allow the use of GET requests with a request body:

"Both HTTP GET and HTTP POST can be used to execute search with body. Since not all clients support GET with body, POST is allowed as well."

I didn't find a hint in the HTTP specification whether this should be allowed or not. This answer on Stackoverflow goes in the same direction as my initial concern, that you shouldn't do it because it might not be expected by users and also not supported by some stacks.

Finally, in this message Roy Fielding, one of the authors of the HTTP specification, discourages the use of a request body with GET.

"... any HTTP request message is allowed to contain a message body, and thus must parse messages with that in mind. Server semantics for GET, however, are restricted such that a body, if any, has no semantic meaning to the request."

As the query DSL influences the response it is clear that this is a semantic meaning, which by the words of Roy Fielding shouldn't be the done. So no consensus exists on this topic. By hindsight I am quite surprised that this works out that well with ElasticSearch and there aren't any problems I have heard about.

(*) I wish he didn't tell me, while talking about another talk, that he always looks for mistakes in slides when he's bored ;).

Mittwoch, 5. Juni 2013

Book Review: Hibernate Search by Example

PacktPub kindly offered me a free review edition of Hibernate Search by Example. Though I've used Lucene and Hibernate independently on a lot of projects I've never used Hibernate Search which builds on both technologies. That makes me a good candidate for reviewing an introductory book on it.

The Project

Hibernate Search is a really interesting technology. It transparently indexes Hibernate entities in Lucene and provides a slick DSL for querying the index. You decide which entities and which fields to index by adding annotations on class and field level. Custom analyzer chains can be defined for the entities and referenced from fields. Each entity is written to its own index but it can also include data from related or embedded entities. By default, Lucene is only used for querying and ranking, the result list is still populated from the database. If this is not enough for your application you can also use projection to use stored fields in Lucene for result display.

The Book

Steve Perkins, the author of Hibernate Search by Example, did a great job in designing an example that can evolve with the book. It starts very simple, also explaining the build setup using Maven and an embedded Jetty instance with a H2 database. Each following chapter builds on the results of the previous chapter and enhances the project with different features. Each chapter is dedicated to a certain topic that is immediately used in the application so you can see its benefit. This way you are learning different aspects, from mapping entities and performing queries to advanced mapping aspects, analyzing, filtering, using facets and even setting up master slave systems and sharding your data. But not only is the book structured in a good way, the author also has a very clear writing. Combined with the practical examples this makes it very easy to read. If you're planning to implement a solution on top of Hibernate Search you're well advised to read this book.

My Impression of the Technology

As I am building quite a lot of search applications I'd like to add some impressions of Hibernate Search. Though it's a very interesting technology I think you should be careful when deciding on its use. Not only will you tie yourself to Hibernate as your JPA provider but there are also some implications on the search side. Hibernate Search offers advanced features like faceting, it can be distributed and sharded. But there might be features that you want to build later on that would be far more easy with a search server like Solr or ElasticSearch. Hibernate Search uses some components of Solr but a real integration (using Solr as a backend) is rather difficult I guess. Solr needs in schema configured in a file so you would need to duplicate it. ElasticSearch could be a far better candidate as its schema mapping can be created with its REST API. I am really curious if somebody has been thinking about starting an implementation of Hibernate Search on top of ElasticSearch. With the Lucene implementation that is described in the book you can easily enhance your database driven application with advanced search functionality. But be aware that future requirements might be more difficult to build compared to integrating a search server from the beginning.

Dienstag, 28. Mai 2013

Getting Started with ElasticSearch: Part 1 - Indexing

ElasticSearch is gaining a huge momentum with large installations like Github and Stackoverflow switching to it for its search capabilities. Its distributed nature makes it an excellent choice for large datasets with high availability requirements. In this 2 part article I'd like to share what I learned building a small Java application just for search.

The example I am showing here is part of an application I am using for talks to show the capabilities of Lucene, Solr and ElasticSearch. It's a simple webapp that can search on user group talks. You can find the sources on GitHub.

Some experience with Solr can be helpful when starting with ElasticSearch but there are also times when it's best to not stick to your old knowledge.

Installing ElasticSearch

There is no real installation process involved when starting with ElasticSearch. It's only a Jar file that can be started immediately, either directly using the java command or via the shell scripts that are included with the binary distribution. You can pass the location of the configuration files and the index data using environment variables. This is a Gradle snippet I am using to start an ElasticSearch instance:

task runES(type: JavaExec) {
    main = 'org.elasticsearch.bootstrap.ElasticSearch'
    classpath = sourceSets.main.runtimeClasspath
    systemProperties = ["es.path.home":'' + projectDir + '/elastichome',
                        "es.path.data":'' + projectDir + '/elastichome/data']
}

You might expect that ElasticSearch uses a bundled Jetty instance as it has become rather common nowadays. But no, it implements all the transport layer with the asynchronous networking library Netty so you never deploy it to a Servlet container.

After you started ElasticSearch it will be available at http://localhost:9200. Any further instances that you are starting will automatically connect to the existing cluster and even use another port automatically so there is no need for configuration and you won't see any "Address already in use" problems.

You can check that your installation works using some curl commands.

Index some data:

curl -XPOST 'http://localhost:9200/jug/talk/' -d '{
    "speaker" : "Alexander Reelsen",
    "date" : "2013-06-12T19:15:00",
    "title" : "Elasticsearch"
}'

And search it:

curl -XGET 'http://localhost:9200/jug/talk/_search?q=elasticsearch'

The url contains two fragments that determine the index name (jug) and the type (talk). You can have multiple indices per ElasticSearch instance and multiple types per index. Each type has its own mapping (schema) but you can also search across multiple types and multiple indices. Note that we didn't create the index and the type, ElasticSearch figures out index name and mapping automatically from the url and the structure of the indexed data.

Java Client

There are several alternative clients available when working with ElasticSearch from Java, like Jest that provides a POJO marshalling mechanism on indexing and for the search results. In this example we are using the Client that is included in ElasticSearch. By default the client doesn't use the REST API but connects to the cluster as a normal node that just doesn't store any data. It knows about the state of the cluster and can route requests to the correct node but supposedly consumes more memory. For our application this doesn't make a huge difference but for production systems that's something to think about.

This is an example setup for a Client object that can then be used for indexing and searching:

Client client = NodeBuilder.nodeBuilder().client(true).node().client();

You can use the client to create an index:

client.admin().indices().prepareCreate(INDEX).execute().actionGet();

Note that the actionGet() isn't named this way because it is an HTTP GET request, this is a call to the Future object that is returned by execute, so this is the blocking part of the call.

Mapping

As you have seen with the indexing operation above ElasticSearch doesn't require an explicit schema like Solr does. It automatically determines the likely types from the JSON you are sending to it. Of course, this might not always be correct, and you might want to define custom analyzers for your content so you can also adjust the mappings to your needs. As I was so used to the way Solr does this that I was looking for a way to add the mapping configuration via a file in the server config. This is something you can do indeed using a file called default-mapping.json or via index templates. On the other hand you can also use the REST based put mapping API which has the benefit that you don't need to distribute the file to all nodes manually and also you don't need to restart the server. The mapping then is part of the cluster state and will get distributed to all nodes automatically.

ElasticSearch provides most of its API via Builder classes. Surprisingly I didn't find a Builder for the mapping. One way to construct it is to use the generic JSON builder:

XContentBuilder builder = XContentFactory.jsonBuilder().
  startObject().
    startObject(TYPE).
      startObject("properties").
        startObject("path").
          field("type", "string").field("store", "yes").field("index", "not_analyzed").
        endObject().
        startObject("title").
          field("type", "string").field("store", "yes").field("analyzer", "german").
        endObject().
        // more mapping
      endObject().
    endObject().
  endObject();
client.admin().indices().preparePutMapping(INDEX).setType(TYPE).setSource(builder).execute().actionGet();

Another way I have seen is to put the mapping in a file and just read it to a String, e.g. by using the Guava Resources class.

After you have adjusted the mapping you can have a look at the result at the _mapping endpoint of the index at http://localhost:9200/jug/_mapping?pretty=true.

Indexing

Now we are ready to index some data. In the example application I am using simple data classes that represent talks to be indexed. Again, you have different options how to transform your objects to the JSON ElasticSearch understands. You can build it by hand, e.g. with the XContentBuilder we have already seen above, or more conveniently, by using something like the JSON processor Jackson that can serialize and deserialize Java objects to and from JSON. This is what it looks like when using the XContentBuilder:

XContentBuilder sourceBuilder = XContentFactory.jsonBuilder().startObject()
  .field("path", talk.path)
  .field("title", talk.title)
  .field("date", talk.date)
  .field("content", talk.content)
  .array("category", talk.categories.toArray(new String[0]))
  .array("speaker", talk.speakers.toArray(new String[0]));
IndexRequest request = new IndexRequest(INDEX, TYPE).id(talk.path).source(sourceBuilder);
client.index(request).actionGet();

You can also use the BulkRequest to prevent having to send a request for each document.

With ElasticSearch you don't need to commit after you indexed. By default, it will refresh the index every second which is fast enough for most use cases. If you want to be able to search the data as soon as possible you can also call refresh() on the client. This can be really useful when writing tests and you don't want to wait for a second between indexing and searching.

This concludes the first part of this article on getting started with ElasticSearch using Java. The second part contains more information on searching the data we indexed.

Dienstag, 19. Februar 2013

Softwerkskammer Rhein-Main Open Space

On Saturday I attended an Open Space in Wiesbaden, organized by members of Softwerkskammer Rhein-Main, a very active chapter of the German software craftmanship community. The event took place in the offices of Seibert Media above a shopping mall including a nice view of the city.

The Format

Open Space conferences are special as there is no predefined agenda. All the attendees can bring ideas and propose those in the opening session and choose a time slot and room. Sessions are not necessarily normal presentations but rather discussions so it's even OK to just propose a question that you have or a topic you'd like to learn more about from the attendees. Also, there are some guidelines and rules: sessions don't need to start and end in time, you can always leave a session in case you feel you can't contribute and you shouldn't be disappointed if nobody shows up for your proposed session.

Personal Kanban

Dennis Traub presented a session on Personal Kanban. As I did Kanban style development in one project already I was eager to learn how to apply the principles to personal organization. Basically it all works the same as normal Kanban. Tasks are visualized on a board where a swimlane defines the state of a task with work items flowing from left (todo) to right (done). You can define swimlanes as it fits your habits, e.g. one for todos, one for in progress and one for blocked. The in progress lane needs to have a Work in Progress limit which is the amount of tasks you start and process in parallel. An important aspect is that you don't have to put all your backlog items to the todo lane but you can also keep them in a seperate place. This keeps you from getting overwhelmed when looking at the board.

It sounds like Kanban is a good way for organizing your daily life. For me personally the biggest hindrance is that I am working from my living room and I'd rather not put a Kanban board in my living room. If I'd use a separate office I guess I'd try it immediately.

Open Source

An attendee wanted to know some experiences with Open Source communities. Two full time committers, Ollie for Spring and Marcel for Eclipse, shared some of their experiences. I am still surprised that a lot of Open Source projects have quite some bugs in their trackers that could easily be fixed by newcomers. A lot of people like Open Source software but not that many seem to be interested in contributing to a project continuously. Most of the interaction with users in the issue trackers are one time reports, so the people report one bug and move on. Even for big projects like Spring and Eclipse it's hard to find committers. One way to motivate people is to organize hack days where users learn to work with the sources of the projects but this also needs quite some preparation.

Freelancing

The topic of freelancing was discussed all over the day. Markus Tacker presented his idea of the kybernetic agency, a plan to form a freelance network with people who can work on projects together. We discussed benefits and possible problems, mainly of legal type. A quite inspiring session that also made me think about the difference of freelancing in the Java enterprise world compared to PHP development. Most of the freelancers I know would prefer not to work 5 days a week for one client exclusively but that is often a prerequisite for projects in the enterprise world.

Learning

Learning is a topic that is very important to me so I proposed a session on it. I already switched from 5 to 4 days the last months of my employment at synyx because I felt the need to invest more time in learning which is often not possible when working on client projects. Even now as a freelancer I keep one day for learning only. What works best for me is writing blogposts that contain some sample code. I can build something and when writing the post I make sure that I have a deep understanding of the topic I am writing about. Other people also said that the most important aspect is to have something to work on, reading or watching screencasts alone is no sustainable activity. I also liked the technique of another freelancer: whenever he notices that he could do something different on the current project he stops to track the time for the customer and tries to find ways to improve the project, probably learning a new approach. This is something you are doing implicitly as a freelancer, you often spend some of your spare time thinking about client work but I like this explicit approach.

Summary

All in all this was a really fruitful, but also exhausting, day. Though I chose meta topics exclusively I gained a lot from visiting. Thanks a lot to the organizers (mainly Benjamin), moderators, sponsors and all the attendees that made this event possible. I am looking forward to meeting a lot of the people again at Socrates this year.

Freitag, 1. Februar 2013

Book Review: Gradle Effective Implementation Guide

PacktPub kindly offered me a free review edition of Gradle Effective Implementation Guide written by mrhaki Hubert Klein Ikkink. As I planned to read it anyway I agreed to write a review of it.

Maven was huge for Java Development. It brought dependency management, sane conventions and platform independent builds to the mainstream. If there is a Maven pom file available for an open source project you can be quite sure to manage to build it on your local machine in no time.

But there are cases when it doesn't work that well. Its phase model is rather strict and the one-artifact-per-build restriction can get in your way for more unusual build setups. You can workaround some of these problems using profiles and assemblies but it feels that it is primarily useful for a certain set of projects.

Gradle is different. It's more flexible but there's also a learning curve involved. Groovy as its build DSL is easy to read but probably not that easy to write at first because there are often multiple ways to do something. As a standard Java developer like me you might be unsure about the proper way of doing something.

There are a lot of helpful resources online, namely the forum and the excellent user guide but as I prefer to read longer sections offline I am really glad that there now is a book available that contains extensive information and can get you started with Gradle.

Content

The book starts with a general introduction into Gradle. You'll get a high level overview of its features, learn how to install it and write your first build file. You'll also learn some important options of the gradle executable that I haven't been aware of.

Chapter 2 explains tasks and how to write build files. This is a very important chapter if you are not that deep into the Groovy language. You'll learn about the implicitly available Task and Project instances and the different ways of accessing methods and properties and of defining tasks and dependencies between them.

Working with files is an important part of any build system. Chapter 3 contains detailed information on accessing and modifying files, file collections and file trees. This is also where the benefit of using Groovy becomes really obvious. The ease of working with collections can lead to very concise build definitions though you have all the power of Groovy and the JVM at your hands. The different log levels are useful to know and can come in handy when you'd like to diagnose a build.

While understanding tasks is an important foundation for working with Gradle it's likely that you are after using it with programming languages. Nearly all of the remaining chapters cover working with different aspects on builds for JVM languages. Chapter 4 starts with a look at the Java plugin and its additional concepts. You'll see how you can compile and package Java applications and how to work with sourceSets.

Nearly no application is an island. The Java world provides masses of useful libraries that can help you build your application. Proper dependency management, as introduced in Chapter 5, is important for easy build setups and for making sure that you do not introduce incompatible combinations of libraries. Gradle supports Maven, Ivy and local file based repositories. Configurations are used to group dependencies, e.g. to define dependencies that are only necessary for tests. If you need to influence the version you are retrieving for a certain dependency you can configure resolution strategies, version ranges and exclusions for transitive dependencies.

Automated testing is a crucial part of any modern software development process. Gradle can work with JUnit and TestNG out of the box. Test execution times can be improved a lot by the incremental build support and the parallelization of tests. I guess this can lead to dramatically shorter build times, something I plan to try on an example project with a lot of tests in the near future. This chapter also introduces the different ways to run an application, create distributions and how to publish artifacts.

The next chapter will show you how you can structure your application in separate projects. Gradle has clever ways to find out which projects need to be rebuild before and after building a certain project.

Chapter 8 contains information on how to work with Scala and Groovy code. The necessary compiler versions can be defined in the build so there is no need to have additional installations. I've heard good things about the Scala integration so Gradle seems to be a viable alternative to sbt.

The check task can be used to gather metrics on your project using many of the available open source projects for code quality measurement. Chapter 9 shows you how to include tools like Checkstyle, PMD and FindBugs to analyze your project sources, either standalone or by sending data to Sonar.

If you need additional functionality that is not available you can start implementing your own tasks and plugins. Chapter 10 introduces the important classes for writing custom plugins and how to use them from Groovy and Java.

Gradle can be used on several Continuous Integration systems. As I've been working with Hudson/Jenkins exclusively during the last years it was interesting to also read about the commercial alternatives Team City and Bamboo in Chapter 11.

The final chapter contains a lot of in depth information on the Eclipse and IDEA plugins. Honestly, this contains more information on the Eclipse file format than I wanted to know but I guess that can be really useful for users. Unfortunately the excellent Netbeans plugin is not described in the book.

Summary

The book is an excellent introduction into working effectively with Gradle. It has helped me to get a far better understanding of the concepts. If you are thinking about or already started working with Gradle I highly recommend to get a copy. There are a lot of detailed example files that you can use immediately. Many of those are very close to real world use cases and can help you thinking about additional ways Gradle can be useful for organizing your builds.