Mittwoch, 14. April 2010

Using OpenCmsTestCase

These are basically some notes for me because I just had to relearn all of it.

To use OpenCmsTestCase in a project the following steps have to be applied:

  • Download the source distribution and unzip it

  • set the file encoding: export ANT_OPTS=-Dfile.encoding=iso-8859-1

  • run ant bindist

  • run ant compile-tests

  • Somehow some unittests are always not compiling for me: remove the java files and the entries from the TestSuites

  • create a jar file from the folder org in ../BuildCms/build/test, e.g. jar -cf opencms-test-7.5.2.jar org

  • add the jar to your project classpath/deploy to your maven repository

  • add hsqldb.jar to your project

  • copy the folders data and webapp to your project

  • copy test/log4j.properties and test/test.properties to your test classpath and adjust the directory paths in test.properties (a good reason to use Maven so you can use the resource filtering mechanism)

  • play with the files in data/imports and adjust them to your needs


A simple test case example:

import org.opencms.file.CmsObject;
import org.opencms.file.CmsResource;
import org.opencms.test.OpenCmsTestCase;
import org.opencms.test.OpenCmsTestProperties;

public class DummyOpenCmsTest extends OpenCmsTestCase {

static {
OpenCmsTestProperties.initialize(OpenCmsTestProperties.getResourcePathFromClassloader("test.properties"));
}

public DummyOpenCmsTest(String name) {
super(name);
}

@Override
public void setUp() throws Exception {
super.setUp();
setupOpenCms("simpletest", "/sites/default/");
}

public void testExistingResource() throws Exception {
CmsObject cms = getCmsObject();
CmsResource res = cms.readResource("/index.html");
assertEquals("/sites/default/index.html", res.getRootPath());
}

@Override
public void tearDown() throws Exception {
super.tearDown();
removeOpenCms();
}
}

Samstag, 10. April 2010

Unicode is not UTF-8

Problems with encoding are common on a lot of projects I worked on. Sometimes I tend to get the feeling that I understand most of it but then there are always aspects I did not get right. This week I noticed that even my basic knowledge is not really firm.

Currently I am working on a system where we do a lot of imports from other systems that provide data as XML. The company that delivers the data sent us some sample data that we tried to import. The XML document was supposed to be in UTF-8 but somehow our parser always choked on some byte sequences. When we added iso-8859-1 to the xml prolog the parsing was working fine but all non-ASCII characters where not displayed correctly.

Using hexedit I looked at the document and located the values for non-ASCII characters like 'ä' which is displayed as 'C3 A4' in hex. But looking it up in the Unicode code chart it should be the value '00 E4'. We complained that the data seemed to be send in a different encoding but not UTF-8.

Of course the company could not find any problem because we were just wrong. Unicode is not UTF-8! UTF-8 is an encoding scheme which is used to encode unicode characters but the byte values do not match.

Let's analyze the example character 'ä' in a UTF-8 document. It displays as 'C3 A4'. In binary format this is:
1100 0011 1010 0100

UTF-8 uses a start byte and one or more continuation bytes. A start byte is identified by two leading '11' which makes the first byte our start byte. Continuation bytes are identified by a leading '10', so the second byte is a continuation byte. These are the control bits that are used by UTF-8. Let's see what our sequence looks like if we just remove these control bits, shift the bits together and pad the left side with 0:
0000 0000 1110 0100

Of course this is the expected Unicode value '00 E4' for 'ä'.

Very basic, but I still managed to get it wrong.

Later a colleague noticed that in some part of our application a String was created from a byte array without specifying an explicit encoding. Ouch! Finally it was fixed quickly but we should have looked at our code first before blaming the data provider.

Freitag, 9. April 2010

Playing with Groovy

As I am the one who does most of our OpenCms projects I am also the one who has to deploy new versions to our internal Nexus repository so we can easily use the libraries from Maven. The guys developing OpenCms use Ant for their builds so there is no official Maven repository available.

Most of the time I added only the dependencies that are really necessary on compile time because creating a Maven POM by hand is quite cumbersome, not to mention the deployment of all the dependencies (either uploading to Nexus or deploying using Maven). One of those time consuming tasks that needs some automation.

I chose Groovy for implementing a little helper script because it is really good at dealing with XML and it's always good to learn some new techniques. I already got some experience in modifying existing scripts but did not use it for creating something from scratch.

The script basically just reads a folder with jars and creates a pom for the whole project as well as a script for deploying all additional artifacts using Maven. Creating the files currently involves two steps:

  1. A properties file is created from the information that is guessed from the filename of the jars. As jars are often not named consistent this will not succeed for all jars. So you have to review the file and change some group names, artifactIds and versions (another step that could be automated, e.g. by querying a Nexus instance, but let's save some work for the future ;) ). Another properties file holds project and deployment information like project groupId and artifactId and the server to deploy to.

  2. The properties files are read again by another script and the project pom as well as the deployment script is generated.


Of course you also have to review the deployment script because you don't want to deploy any artifacts that are already available. A good way to find out which artifacts are missing is to call something like {{{mvn compile}}} on the generated pom.
I uploaded the (uncommented) scripts and helper classes, maybe it's useful for somebody.

I guess there are far better solutions to creating maven projects from existing libraries, I am looking forward to hearing about them.

The XML manipulation features of Groovy are really nice. When writing XML you append your node structure to a builder object and it creates the markup for you. You always stay very close to the format you intend to output. E.g. this is the code to create the XML for a pom file:

def writer = new StringWriter();
def xmlBuilder = new MarkupBuilder(writer);

xmlBuilder.project('xmlns' : 'http://maven.apache.org/POM/4.0.0',
'xmlns:xsi' : 'http://www.w3.org/2001/XMLSchema-instance',
'xsi:schemaLocation' : 'http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd') {
modelVersion('4.0.0')
groupId(project.group)
artifactId(project.artifact)
packaging('jar')
version(project.version)
name(project.artifact)
dependencies() {
dependencies.each() { dep ->
dependency {
groupId(dep.group)
artifactId(dep.artifact)
version(dep.version)
}
}
}
}

You are looking at this code and immediately can imagine the structure of the resulting XML. I like it!

Also, file manipulation is really nice. No need to do any resource cleanup. This code is responsible for writing the created pom file:

new File(dir + "pom.xml").withWriter() { out ->
out.println(projectPom.toString());
}

Some drawbacks I noticed when doing scripting like this:

  • I tend to get sloppy while coding. Not adding semicolons to the end of lines, doing too many things in one class/script, ...

  • Code completion in Netbeans is horrible if you are used to Java standards, but I guess that is just very hard to implement

  • You have to compile manually, at least in Netbeans. If you are changing a Groovy class that is used by a script you have to remember to do a build because when running the script Netbeans will not compile it for you.

  • A lot of coding errors are only discovered on runtime. E.g. using the wrong name for properties or calling a constructor that doesn't exist


I still can't imagine using a dynamic language on production code. Of course the deployment time tends to be shorter but still you have to execute the code to see if it is really correct. I doubt that writing tests could compensate for the lack of static type checking but probably this needs a shift of the mindset.