Montag, 15. März 2010

Fun with TreeMap

TreeMap is the only implementation of a SortedMap that is included in the JDK. It stores its values in a Red-black tree, therefore its entries can be accessed, inserted and deleted really fast.

Unfortunately, using the class is not really intuitive, so you better read the API docs carefully or prepare to spend hours of your time chasing mysterious bugs.

Imagine that you want to use a TreeMap with String keys and String values that is ordered according to the length of the key in a way so that the longest key comes first. Should be quite simple, just implement a comparator that does the work for you:

import java.util.Comparator;

public class StringLengthComparator implements Comparator<String> {

public int compare(String o1, String o2) {
// if both lengths are the same return 0
return o2.length() - o1.length();
}
}

If both Strings have the same length we return 0 and a negative or positive value otherwise. As I always get confused when to return a positive value and when a negative let's create a simple test:

Map<String, String> map;

@Before
public void setUp() {
map = new TreeMap<String, String>(new StringLengthComparator());
}

@Test
public void testValuesWithDifferentLength() {

map.put("zero", "0");
map.put("one", "1");
map.put("three", "3");

assertThat("Length failed", map.size(), is(3));

Iterator<String> it = map.keySet().iterator();
assertThat(it.next(), is("three"));
assertThat(it.next(), is("zero"));
assertThat(it.next(), is("one"));
}

If we run this test everything seems to be ok;

Testsuite: StringLengthComparatorTest
Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0,018 sec

But what happens if we just insert the missing two? Lets see:

@Test
public void testDifferentValuesWithSameLength() {

map.put("zero", "0");
map.put("one", "1");
map.put("two", "2");
map.put("three", "3");

assertThat("Length failed", map.size(), is(4));

Iterator<String> it = map.keySet().iterator();
// just check the first two values as we do not know the order of the same length
assertThat(it.next(), is("three"));
assertThat(it.next(), is("zero"));
}

This test should also pass, we just can't tell the ordering of the same length keys:

Testcase: testDifferentValuesWithSameLength(StringLengthComparatorTest): FAILED
Length failed
Expected: is <4>
got: <3>

Oops ... what happened here? One of the values just disappeared? We are inserting 4 entries but there are only 3 in the Map?

The first time I stumbled across this it took me several hours to figure out what was going on. After I found out what the problem was it should have taught me to always read the javadoc carefully. The contract for TreeMap states the following:
Note that the ordering maintained by a sorted map (whether or not an explicit comparator is provided) must be consistent with equals if this sorted map is to correctly implement the Map interface.

What exactly does this mean? Comparator tells us more about consistent with equals:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.

Normally, the Map- and Set-Interface use equals to determine if a certain key (for a Map) or a certain value (for a Set) is already inserted. But for TreeMap and TreeSet this is not the case. These implementations use compareTo for determining whether an Object is equal to another Object. This means you are not allowed to return 0 from a compare-method if the two Objects are not equal!

How to fix it? Just test if the two Strings are equal before comparing the length in the Comparator:

public int compare(String o1, String o2) {

// return 0 only if both are equal
if (o1.equals(o2)) {
return 0;
} else if (o1.length() > o2.length()) {
return -1;
} else {
return 1;
}
}

Alright, the test passes. Just to be sure we add another test:

@Test
public void testYetMoreValues() {

map.put("zero", "0");
map.put("one", "1");
map.put("four", "4");

assertThat("Length failed", map.size(), is(3));

assertNotNull("Zero's not there", map.get("zero"));
}

Seems to be a stupid test but let's see what happens:

Testsuite: StringLengthComparatorTest
Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 0,027 sec

Testcase: testYetMoreValues(StringLengthComparatorTest): FAILED
Zero's not there
junit.framework.AssertionFailedError: Zero's not there
at StringLengthComparatorTest.testYetMoreValues(StringLengthComparatorTest.java:51)

How can this happen? We tested for equality but the "zero" value still is not there? Let's see what the tree looks like on every step.

Still simple if the first value ("zero") is inserted: The entry is the root node:

When inserting "one" afterwards, the Comparator tells that the result of the comparison is 1, which means that the node has to be inserted on the right side:

When inserting "four", the tree looks a little bit different. As our Comparator in this case also return 1, the node is also inserted right of "zero" but needs to be left of "one". "four" is our new root node:

Seems to look ok. When reading from left to right the nodes are sorted according to their length. But let's see what happens when we try to look up "zero" based on the tree above. We start with the root node "four". As the length of "zero" and "four" is the same, our Comparator returns 1. But on the right side it only finds the "one" node. Our comparator returns -1 when comparing "four" to "one". Nothing there, so null is returned.

The Javadoc for Comparator tells us more on what went wrong here:
The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y.

This means that the method needs to be implemented symetrically. When calling compare("zero", "one") we always need to return the negated value of calling compare("one", "zero"). With our implementation this is not the case as we always return 1 as a fallback.

I hope this is a final implementation of the Comparator delegating to Strings compareTo()-method which is already implemented symetrically:

public int compare(String o1, String o2) {

// return 0 only if both are equal
if (o1.equals(o2)) {
return 0;
} else if (o1.length() == o2.length()) {
// delegate to Strings compareTo to get symmetric behavior
return o1.compareTo(o2);
} else if (o1.length() > o2.length()) {
return -1;
} else {
return 1;
}
}

I struggled with both of these problems on two different projects and it always took me some hours. Thanks to my colleague Marc who led me to the solution of the symmetric problem.
About Florian Hopf

I am working as a freelance software developer and consultant in Karlsruhe, Germany and have written a German book about Elasticsearch. If you liked this post you can follow me on Twitter or subscribe to my feed to get notified of new posts. If you think I could help you and your company and you'd like to work with me please contact me directly.

1 Kommentar:

  1. Hi Flo!

    Interesting problem. I'm often too lazy to read the javadoc carefully too and it often bites me.

    Keep on posting.

    Take Care,
    Sebastian

    AntwortenLöschen