Donnerstag, 19. November 2009

Implementing an API abstraction using Google Collections

I am using OpenCms, the open source content management system, quite often. It is a good choice for building structured medium to large sized websites. One drawback though sometimes is the layout of the APIs. E.g. for accessing the resources managed in the system programatically you have to use some final and nonfinal classes that depend on a running instance of OpenCms. The original developers very likely chose this approach for security reasons but this can become quite cumbersome, e.g. when dealing with tests, as you cannot mock some of these classes easily. There is an integration test facility that comes with OpenCms but as this starts an OpenCms instance every time a test is executed, running tests takes some time. I tend to write tests that work without a running system whenever possible.

To improve testability of my components I implemented a thin layer above the normal OpenCms access means using some interfaces and simple POJOs so that my business logic can be tested without starting OpenCms. One interface and its default implementation act as a kind of DAO for accessing the virtual filesystem of OpenCms. Its method signatures do not contain any OpenCms dependencies that can't be mocked or reconstructed easily. E.g. to represent file resources the OpenCms class normally used is CmsResource. This class is quite difficult to instanciate outside of a running OpenCms instance as it contains internal references to different database tables. To reduce the need for mocking these external classes I implemented a simple POJO, Resource, that contains relevant information like the path to the resource and it's type.

Some methods of my VFS DAO return a Collection of Resources, e.g. when reading all resources in a subfolder. As the OpenCms API returns an untyped List that contains CmsResources and in my interface method signature I use List<? extends Resource> some transformation needs to take place. In the first project I used the abstraction I implemented it in a really simple way:

List<Resource> resources = new ArrayList<Resource>();
@SupressWarnings("Unchecked")
List<CmsResources> cmsResources = cms.readResources(...);
for (CmsResource cmsResource: cmsResources) {
resources.add(transform(cmsResource);
}

The transform method just creates an instance of the Resource and fills it with the needed values.

private Resource transform(CmsResource cmsResource) {
Resource resource = new Resource();
resource.setDateLastModified(cmsResource.getDateLastModified());
...
return resource;
}

This approach works and in my opinion is ok to use in many circumstances. I sacrificed some performance for a gain in testability and design. But for large collections or operations that are triggered frequently this of course can become a performance issue as for the sake of abstraction it is necessary to iterate the collection.

The better solution is to use a lazy list that transforms the CmsResources on the fly to Resources. With a lazy list you don't have to iterate it when transforming. The transformation happens when you are accessing the list.

Google Collections provides a functional style approach for transforming lists lazily. You create a class that implements the interface Function that can be typed for the source and target. In its apply method the transformation step is implemented in basically the same way as in the method displayed above.

public class ResourceTransformationFunction implements Function {

public Resource apply(CmsResource cmsResource) {
Resource resource = new Resource();
resource.setDateCreated(cmsResource.getDateCreated());
resource.setDateLastModified(cmsResource.getDateLastModified());
...
return resource;
}
}

The original List is transfomed using a static method call that accepts an instance of our Function:

@SuppressWarnings("unchecked")
List<CmsResource> cmsResources = cms.readResources(...);
List<Resource> resources = Lists.transform(cmsResources, new ResourceTransformationFunction());

The transformation happens when the List is accessed so when you are iterating the collection only once, which should be the case in most applications, there is no overhead at all (besides the creation of the new objects).

I really like the ease of use and reusabilty of the Google Collections solution. Also, the jar comes with absolutely no dependencies which makes it easily embedabble in any project.

A similiar functional approach will be part of the new concurrency features in JDK 7. ParallelArray, which makes use of the Fork/Join framework will provide the ability to use functions and predicates when constructing arrays. Brian Goetz' talk on Devoxx 2008 contained a detailed introduction to these features.

On Devoxx 2009, Dick Wall of the Javaposse held a really good talk about appliying a more functional style of programming to the Java programming language. This talk will be available some time in the future at parleys.com