Mittwoch, 28. August 2013

FrOSCon 8 2013 - Free and Open Source Software Conference

Last weekend I attended FrOSCon, the Free and Open Source Software Conference taking place in St. Augustin near Bonn. It's a community organized conference with an especially low entrance fee and a relaxed vibe. The talks are a good mixture of development and system administration topics.

Some of the interesting talks I attended:

Fixing Legacy Code by Kore Nordmann and Benjamin Eberlein

Though this session was part of the PHP track it contained a lot of valuable information related to working with legacy code in any language. Besides strategies for getting an application under test the speakers showed some useful refactorings that can make sense to start with. Slides

Building Awesome Ruby Command Line Apps by Christian Vervoorts

Christian first showed some of the properties that make up a good command line app. You should choose sane default values but make those configurable. Help functionality is crucial for a good user experience, via -h parameter and a man page. In the second part Chistian introduced some Ruby gems that can be used to build command line apps. GLI seems to be the most interesting with a nice DSL and its scaffolding functionality.

Talking People Into Creating Patches by Isabel Drost-Fromm

Isabel, who is very active in the Apache community, introduced some of her findings when trying to make students, researchers and professionals participate in Open Source. The participants where a mixture of people running open source projects and developers that are interested in contributing to open source. I have been especially interested in this talk because I wouldn't mind having more people help with the Odftoolkit I am also working on. When working with professionals, who are the main target, it is important to answer quickly on mails or issues as they might move on to other projects and might not be able to help later on. Also, it's nice to have some easy tasks in the bugtracker that can be processed by newbies.

MySQL Performance Schema by Carsten Thalheimer

Performance Schema is a new feature in MySQL 5.5 and is activated by default since 5.6. It monitors a lot of the internal functionality like file access and queries so you can later see which parts you can optimize. Some performance measurements done by the MySQL developers showed that keeping it activated has an performance impact of around 5%. Though this doesn't sound that good at first I think you can gain a lot more performance by the insight you have in the inner workings. Working with Performance Schema is supposed to be rather complex ("Take two weeks to work with it"), ps_helper is a more beginner friendly functionality that can get you started with some useful metrics.

Summary

FrOSCon is the one of the most relaxing conferences I know. It is my goto place for seeing stuff that is not directly related to Java development. The low fee makes it a no brainer to attend. If you are interested in any of this years talks they will also be made available online.

Mittwoch, 21. August 2013

Getting Started with ElasticSearch: Part 2 - Querying

This is the second part of the article on things I learned while building a simple Java based search application on top of ElasticSearch. In the first part of this article we looked at how to index data in ElasticSearch and what the mapping is. Though ElasticSearch is often called schema free specifying the mapping is still a crucial part of creating a search application. This time we will look at the query side and see how we can get our indexed talks out of it again.

Simple Search

Recall that our documents consist of a title, the date and the speaker of a talk. We have adjusted the mapping so that for the title we are using the German analyzer that stems our terms and we can search on variations of words. This curl request creates a similar index:

curl -XPUT "http://localhost:9200/blog" -d'
{
    "mappings" : {
        "talk" : {
            "properties" : {
                "title" : { "type" : "string", "store" : "yes", "analyzer" : "german" }
            }
        }
    }
}'

Let's see how we can search on our content. We are indexing another document with a German title.

curl -XPOST "http://localhost:9200/blog/talk/" -d'
{
    "speaker" : "Florian Hopf",
    "date" : "2012-07-04T19:15:00",
    "title" : "Suchen und Finden mit Lucene und Solr"
}'

All searching is done on the _search endpoint that is available on the type or index level (you can also search on multiple types and indexes by separating them with a comma). As the title field uses the German analyzer we can search on variations of the words, e.g. suche which stems to the same root as suchen, such.

curl -XGET "http://localhost:9200/blog/talk/_search?q=title:suche&pretty=true"                                                                       
{
  "took" : 14, 
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
},                                                                                                                                                                                                                             
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "talk",
      "_id" : "A2Qv3fN3TkeYEhxA4zicgw",
      "_score" : 0.15342641, "_source" : {
        "speaker" : "Florian Hopf",
        "date" : "2012-07-04T19:15:00",
        "title" : "Suchen und Finden mit Lucene und Solr"
      }
    } ]
  }

The _all field

Now that this works, we might want to search on multiple fields. ElasticSearch provides the convenience functionality of copying all field content to the _all field that is used when omitting the field name in the query. Let's try the query again:

curl -XGET "http://localhost:9200/blog/talk/_search?q=suche&pretty=true"
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }

No results. Why is that? Of course we have set the analyzer correctly for the title as we have seen above. But this doesn't mean that the content is analyzed in the same way for the _all field. As we didn't specify an analyzer for this field it still uses the StandardAnalyzer that splits on whitespace but doesn't do any stemming. If you want to have a consistent behavior for the title and the _all field you need to set the analyzer in the mapping:

curl -XPUT "http://localhost:9200/blog/talk/_mapping" -d'
{
    "mappings" : {
        "talk" : {
            "_all" : {"analyzer" : "german"},
            "properties" : {
                "title" : { "type" : "string", "store" : "yes", "analyzer" : "german" }
            }
        }
    }
}'

Note that as with all mapping changes you can't change the type of the _all field once it's created. You need to delete the index, put the new mapping and reindex your data. Afterwards our search will return the same results for the two queries.

_source

You might have noticed from the example above that ElasticSearch returns the special _source field for each result. This is very convenient as you don't need to specify which fields should be stored. But be aware that this might become a problem for large fields that you don't need for each search request (content section of articles, images that you might store in the index). You can either disable the use of the source field and indicate which fields should be stored in the mapping for your indexed type or you can specify in the query which fields you'd like to retrieve:

curl -XGET "http://localhost:9200/blog/talk/_search?q=suche&pretty=true&fields=speaker,title"
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "talk",
      "_id" : "MA2oYAqnTdqJhbjnCNq2zA",
      "_score" : 0.15342641
    }, {
      "_index" : "blog",
      "_type" : "talk",
      "_id" : "aGdDy24cSImz6DVNSQ5iwA",
      "_score" : 0.076713204,
      "fields" : {
        "speaker" : "Florian Hopf",
        "title" : "Suchen und Finden mit Lucene und Solr"
      }
    } ]
  }

The same can be done if you are not using the simple query parameters but the more advanced query DSL:

curl -XPOST "http://localhost:9200/blog/talk/_search" -d'
{
    "fields" : ["title", "speaker"],
    "query" : {
        "term" : { "speaker" : "florian" }
    }
}'

Querying from Java

Besides the JSON based Query DSL you can also query ElasticSearch using Java. The default ElasticSearch Java client provides builders for creating different parts of the query that can then be combinded. For example if you'd like to query on two fields using the multi_match query this is what it looks like using curl:

curl -XPOST "http://localhost:9200/blog/_search" -d'
{
    "query" : {
        "multi_match" : {
            "query" : "Solr",
            "fields" : [ "title", "speaker" ]
        }
    }
}'

The Java version maps quite well to this. Once you found the builders you need you can use the excellent documentation of the Query DSL for your Java client as well.

QueryBuilder multiMatch = multiMatchQuery("Solr", "title", "speakers");
SearchResponse response = esClient.prepareSearch("blog")
        .setQuery(multiMatch)
        .execute().actionGet();
assertEquals(1, response.getHits().getTotalHits());
SearchHit hit = response.getHits().getAt(0);
assertEquals("Suchen und Finden mit Lucene und Solr", hit.getSource().get("title"));

The same QueryBuilder we are constructing above can also be used on other parts of the query: For example it can be passed as a parameter to create a QueryFilterBuilder or can be used to construct a QueryFacetBuilder. This composition is a very powerful way to build flexible applications. It is easier to reason about the components of the query and you could even test parts of the query on its own.

Faceting

One of the most prominent features of ElasticSearch is its excellent faceting support that not only is used for building search applications but also for doing analytics of large data sets. You can use different kinds of faceting, e.g. for certain terms, using the TermsFacet, or for queries, using the query facet. The query facet would accept the same QueryBuilder that we used above.

TermsFacetBuilder facet = termsFacet("speaker").field("speaker");
QueryBuilder query = queryString("solr");
SearchResponse response = esClient.prepareSearch("blog")
        .addFacet(facet)
        .setQuery(query)
        .execute().actionGet();
assertEquals(1, response.getHits().getTotalHits());
SearchHit hit = response.getHits().getAt(0);
assertEquals("Suchen und Finden mit Lucene und Solr", hit.getSource().get("title"));
TermsFacet resultFacet = response.getFacets().facet(TermsFacet.class, "speaker");
assertEquals(1, resultFacet.getEntries().size());

Conclusion

ElasticSearch has a really nice Java API, be it for indexing or for querying. You can get started with indexing and searching in no time though you need to know some concepts or the results might not be what you expect.

Mittwoch, 14. August 2013

The Pragmatic Programmers Rubber Duck of the 19th Century

In their influential book "The Pragmatic Programmer" Andy Hunt and Dave Thomas describe a technique for finding solutions to hard problems you are struggling with. They recommend to just tell the problem to somebody, not for getting an answer, but because while explaining the problem you are thinking differently about it. And, if there is nobody around you, get yourself a rubber duck you can talk to, hence the name of the tip.

It's obvious that this is not a new discovery made by the authors. Everybody has experienced similar situations where they are finding a solution to a problem while explaining it to someone. But I have been surprised to read this in the essay "Über die allmähliche Verfertigung der Gedanken beim Reden" by Heinrich von Kleist dating to 1805/1806 (translated from German by me):

If you want to know something and you can't find it in meditation I advice you [...] to tell it to the next acquaintance you are meeting. He doesn't need to be a keen thinker and I don't mean you should ask him: no! Rather you should tell him about it in the first place.

1806. The same tip (without the duck). This is another case where we are relearning things that have been discovered before, something especially computer science is prone to.

So, what is the big achievement of the authors? It's not that they are finding only new ideas. We don't need that many new ideas. There is a lot of stuff around that is just waiting to be applied to our work as software developers. Those old ideas need to be put into context. There is even a benefit in stating obvious things that might trigger rethinking your habits.

Sonntag, 11. August 2013

SoCraTes 2013 - Two Days of Open Space

Last week I attended SoCraTes, an international Open Space conference on Software Craftmanship and Testing. As a few days have passed now I'd like to recap what sessions I attended and what was discussed. Though this is only a minor part of the sessions that have been discussed you should get a good grasp on what kind of topics are shared during the two days.

General

The conference is located at Seminarzentrum Rückersbach which is close to nowhere (Aschaffenburg). As there is nothing around you where people would normally go during conferences (pubs) everybody is staying on the premise for full 48h. You sleep there, you eat there, you spend your evening there. Generally you are around the same people from thursday evening to saturday evening which leads to a lot of interesting talks and evening activities.

Open Space format

Some of the time is needed for the framework of the Open Space. On thursday evening there is a world cafe, where you spend time on different tables with different people to discuss what you expect from the conference and what you would like to learn.

Every day starts with the planning of the day, the marketplace. Every participant can propose talks, discussions and hands on sessions and put those on the schedule. There are several rooms available and, if the weather permits, there is also plenty of space outside for discussions. The day then is dedicated to the sessions, some of which I will describe below. In the evening there is some kind of retrospective of the day. Also you can propose evening activities, which can range from discussions, coding and board games.

The sessions I will describe now are only a snapshot of what has been available. As there are so many good things it's sometimes hard to decide which session to attend.

Day 1

Agenda for day 1, photo taken by Robert Hostlowsky, who also blogged about SoCraTes over at the Codecentric blog.

Continuous Delivery

Sergey, who did a lot of sessions for the event, discussed a problem he is facing when trying to implement Continuous Delivery. One of the basic concepts of CD is that you should be using the same artifact for all of your stages, that means an artifact should only be build once and deployed to the different systems for automated testing, user testing and finally deployment. When an artifact is promoted to one of the later stages you would like to update the version so it is obvious that it is a real release candidate. Depending on the technology you are using it might not be that easy to update the version without building the artifact again.

Integrated Tests are a scam

This session, inspired by this article by J.B. Rainsberger, mostly revolved around the problems of testing database heavy applications, where some business logic tends to be contained in the database e.g. by means of OR mapping configurations. During the discussion it became obvious that the term integration test is too overloaded: A lot of people are thinking of integrating external systems like databases whereas others think of it as integrating some of your components. I learned about Hexagonal Architecture which I didn't know as a term before.

Mapping Personal Practices

Markus Gärtner hosted a laid back outdoor session on determining in which areas of your professional life you would like to improve. First we collected all the stuff we are doing daily on a mind map and discussed them. Next we determined which parts we would like to improve in during the next months. I have done similar things for myself before but it was interesting to see what other people are working on and what they are interested in.

Specification by Example Experience Report

Nicole introduced some of the concepts of specification by example using the hot dog point of sale that was also used for the architecture kata during last years SoCraTes. She described some of the experiences they had when introducing it in their company (which doesn't sell hot dogs as far as I know). Specification by Example and BDD had been huge topics during last years conference and I would really like to try it once and see if it really improves communication. The (german) slides she used as an introduction are also available online.

Designing Functional Programs

A few people gathered to discuss the implications functional programming has when designing an application, e.g. when to use higher level functions. Unfortunately nobody was around who had already implemented an application in a functional way. To get some experience we tried to do the Mars Rover kata in Java Script. The kata probably was not the ideal choice as it is rather stateful and therefore a more advanced problem.

Productivity Porn

A discussion on self management and everything around it. People shared some of their practices and Jan showed his foldable Personal Kanban board. It's a fact that you can really spend a lot of time thinking about productivity which sometime is not the most productive thing. But Personal Kanban seems to help a lot of people so I am planning to read the recommended book about it and try it for myself.

Day 2

Agenda for day 2, again taken by Robert.

VIM Show and Tell

Sebastian proposed a session on VIM where everybody should show their favourite plugin or feature. I am still a novice VIM user so I mainly learned some of the basics. Quite some people there are using VIM for their development work, mostly with dynamic languages I guess. This video on InfoQ has been recommended and after watching it, I am also recommending it here.

Monads

Another talk by Nicole where she introduced Monads, starting with an example in Java and moving on to Haskell. I have a better idea now on what Monads could be but the concept still is too abstract for me.

Async Patterns

Sergey presented three alternative solutions for building concurrent solutions:

  • Actors, a core feature of Erlang and Akka
  • Reactive Extensions
  • Communicating Sequential Processes as implemented in Clojure

SOLID Principles

Another session by Sebastian where we discussed the SOLID principles, considered to be the basics of good object oriented design. It was interesting to see that though you think you know the concepts it is still difficult to define them. While looking at some examples it also became obvious that sometimes you might follow one principle while violating another. Unfortunately I couldn't stay for the second, practical part of the session.

Quit your Job

During the world cafe Daniel Temme mentioned that he had given a talk last year on quitting your job and quit his job just again. He told me parts of the story in the evening but I was glad that he decided to give the talk again. Though it is rather provocative the story behind it is important and spans a lot of areas of your life: Sometimes you are caught in your habits and don't really notice that the right thing to do would be something else. Daniel is currently on a journey where he visits company and works for food and accomodation.

Last words

SoCraTes was really awesome again. Thanks to all the organizers and participants that shared a lot. I'll definitely try to be there again next year.

Elasticsearch - Der praktische Einstieg
Java Code Geeks