Donnerstag, 27. Februar 2014

Introduction to the ODF Toolkit

Microsoft Office has been the dominating office suite and unfortunately it still is. For a long time not only the programs were closed but also the file format.

Open Document

Nevertheless there are open alternatives available, most notable Libre Office/Apache OpenOffice.org. In 2005 the OASIS foundation standardized Open Document, an open alternative to the proprietary world of Microsoft. Open Document is heavily influenced by the OpenOffice.org file format but is supported by multiple office suites and viewers.

Open Document files are zip files that contain some XML documents. You can go ahead and unzip any documents you might have:

unzip -l aufwaende-12.ods 
Archive:  aufwaende-12.ods
  Length      Date    Time    Name
---------  ---------- -----   ----
       46  2012-12-31 15:16   mimetype
      815  2012-12-31 15:16   meta.xml
     8680  2012-12-31 15:16   settings.xml
   171642  2012-12-31 15:16   content.xml
     3796  2012-12-31 15:16   Thumbnails/thumbnail.png
        0  2012-12-31 15:16   Configurations2/images/Bitmaps/
        0  2012-12-31 15:16   Configurations2/popupmenu/
        0  2012-12-31 15:16   Configurations2/toolpanel/
        0  2012-12-31 15:16   Configurations2/statusbar/
        0  2012-12-31 15:16   Configurations2/progressbar/
        0  2012-12-31 15:16   Configurations2/toolbar/
        0  2012-12-31 15:16   Configurations2/menubar/
        0  2012-12-31 15:16   Configurations2/accelerator/current.xml
        0  2012-12-31 15:16   Configurations2/floater/
    22349  2012-12-31 15:16   styles.xml
      993  2012-12-31 15:16   META-INF/manifest.xml
---------                     -------
   208321                     16 files

The mimetype file determines what kind of document it is (in this case application/vnd.oasis.opendocument.spreadsheet), META-INF/manifest.xml lists the files in the archive. The most important file is content.xml that contains the body of the document.

Server Side Processing

Though there are quite some viewers and editors for Open Document available when it comes to the server side the situation used to be different. For processing Microsoft Office files there is the Java library Apache POI, which provides a lot of functionality to read and manipulate Microsoft Office files. But if you wanted to process Open Document files nearly your only option was to install OpenOffice.org on the server and talk to it by means of its UNO API. Not exactly an easy thing to do.

ODF Toolkit

Fortunately there is light at the end of the tunnel: the ODF Toolkit project, currently incubating at Apache, provides lightweight access to files in the Open Document format from Java. As the name implies it's a toolkit, consisting of multiple projects.

The heart of it is the schema generator that ingests the Open Document specification that is available as a RelaxNG schema. It provides a template based facility to generate files from the ODF specification. Currently it only generates Java classes but it can also be used to create different files (think of documentation or accessors for different programming languages).

The next layer of the toolkit is ODFDOM. It provides templates that generate classes for DOM access of elements and attributes of ODF documents. Additionally it provides facilities like packaging and document encryption.

For example, you can list the file paths of an ODF document using the ODFPackage class:

OdfPackage pkg = OdfPackage.loadPackage("aufwaende-12.ods");
Set filePaths = pkg.getFilePaths();

If you are familiar with the Open Document spec ODFDOM will be the only library you need. But if you are like most of us and don't know all the elements and attributes by heart there is another project for you: Simple API provides easy access to a lot of the features you might expect from a library like this: You can deal with higher level abstractions like paragraphs for text or rows and cells in the spreadsheet world or search for and replace text.

This code snippet creates a spreadsheet, adds some cells to it and saves it:

SpreadsheetDocument doc = SpreadsheetDocument.newSpreadsheetDocument();
Table sheet = doc.getSheetByIndex(0);
sheet.getCellByPosition(0, 0).setStringValue("Betrag");
sheet.getCellByPosition(1, 0).setDoubleValue(23.0);
doc.save(File.createTempFile("odf", ".ods"));

Code

If you are interested in seeing more code using the ODF Toolkit you can have a look at the cookbook that contains a lot of useful code snippets for the Simple API. Additionally you should keep an eye on this blog for the second part of the series where we will look at an application that extracts data from spreadsheets.

About Florian Hopf

I am working as a freelance software developer and consultant in Karlsruhe, Germany and have written a German book about Elasticsearch. If you liked this post you can follow me on Twitter or subscribe to my feed to get notified of new posts. If you think I could help you and your company and you'd like to work with me please contact me directly.

1 Kommentar:

  1. Thank you for the simple code you have put for generating a ods SpreadSheet.
    I had to modify it for version 0.8.7 of ODF Toolkit, these are the changes

    OdfSpreadsheetDocument doc = OdfSpreadsheetDocument.newSpreadsheetDocument();
    OdfTable sheet = doc.getTableList().get(0);
    sheet.getCellByPosition(0, 0).setStringValue("Betrag");
    sheet.getCellByPosition(1, 0).setDoubleValue(23.0);
    doc.save(File.createTempFile("odf", ".ods"));

    AntwortenLöschen

Elasticsearch - Der praktische Einstieg
Java Code Geeks