Wednesday, August 31, 2011

ACID test for Absolute Positioned Frames.

Based on the ACID test for Absolute Positioned Tables I created another test for Absolute Positioned Frames (APFs). Make sure you have the Ahem font installed. The test document renders as follows:

Again, here is what e.g. LibreOffice.org makes out of it:


Not to mention again that APFs are important for business documents since they are used in letter heads etc.

Monday, August 29, 2011

An ACID test for Absolute Positioned Tables

Inspired by the first ACID test ACID1 also called Box Acid Test I created a simple document to test the handling of Absolute Positioned Tables (APTs). APTs are used a lot in letter heads of e.g. business documents.
The test is very simple. It uses the Ahem font from the CSS test suite. Make sure you have the font installed on your machine in order to run the test.
When successful this document is rendered as follows:

Here is what e.g. LibreOffice (3.4.2) makes out of it:

I really like these kind of tests because they are very visual and it is quite easy to understand whether the test worked or not.

Thursday, August 11, 2011

Advances in the layout engine

I started implementing "Shape" support in my layout engine. What's so special about Shapes is the fact that the text needs to flow around them when wrapping is enabled. Shapes are used a lot e.g. to construct letterheads etc. Good Shape support is absolutely crucial for business documents.
Here is my sample document I used for testing wrap03.docx:

It shows Shape objects with "tight" wrapping and the different wrapping modes as defined in 20.4.3.7 ST_WrapText (Text Wrapping Location) of the OOXML specification:

  • both (Both Sides) Specifies that text shall wrap around both sides of the object.

  • left (Left Side Only) Specifies that text shall only wrap around the left side of the object.

  • right (Right Side Only) Specifies that text shall only wrap around the right side of the object.


I'm really happy to have this key feature in the layout engine. Just to show how difficult this is to implement take a look at what Google Docs makes out of it: wrap03.docx.

Monday, February 07, 2011

Just came back from FOSDEM. Felt really good to meet the “usual suspects” again. Thanks for the great weekend!

I also had a chance to talk with Jos about ODF Web and ODF Collaboration. Jos gave a great talk about his ODF Web Javascript Framework which emerged from his ODFKit efforts.
Jos had a very important slide in his talk which echoed my own believe: NO CONVERSION! This principle guided the design of his ODF Web Framework. NO CONVERSION simply means that Jos does not try to heuristically (aka lossy) map ODF to HTML and then map HTML heuristically (aka lossy) back to ODF. Instead Jos decided to have a clean 2-tier architecture which cleanly separates the content- and the view layer: ODF is content and HTML is the view. I think that’s the right approach. Even more: I think if you start adding “smart conversions”/”heuristics” and other “intelligent mappings” things will get ugly sooner or later. [And from my experience on OpenOffice.org filter hacking things will get messy sooner than you like. Always keep Murphy’s law in mind: What can go wrong will go wrong!].
We also had a chance to talk about Operational Transformation (OT) in the context of ODF. I tried to argue that what is really missing in ODF is a list of “atomic changes” a user can make to an ODF document. If we had this list of “atomic changes” we could build a transformation on top of it. For OT it is very important that you have “atomic” operations, since you need operation transformations for every pair of operation. E.g. if you have |OPS| operations you need |OPS x OPS| transformations. So keeping |OPS| small is quite important!
Assembling the list of atomic operations is a lot of work --- admitted. However it is work that every designer of an API needs to do anyway. I really believe that some input from the ODF API projects like Oracles’ ODFDOM, IBM’s Simple API for ODF, ANR’s LPOD and Jos’ ODFKit could really help.
Let me finish my post by a classification of change to an ODF document:




I believe that for change tracking we only need “atomic operations” and a way to combine them to “compound operations”. I don’t think we need to be able to track changes to the XML tree or the XML text. In fact I think it does more harm than good.

Tuesday, January 11, 2011

A lot has happened since my last blogpost in June 2009.

Its 2011 and I have been working for more than a year on a new project called “Native OfficeOpenXML” (NOOXML). The story is quite simple: I was very disappointed with the quality of the support of the “docx” format in OpenOffice.org. Even more --- I'm very disappointed with the code quality and the design! of the OpenOffice.org Writer core and layout. There are people who believe this can be solved by “code refactoring” fixing “low-hanging-fruits”, “quick wins” and other magic silver-bullet-phrases. But one thing was for certain: There is no way to (re-)implement a core and a layout engine. Can't be done. Impossible. No way.

OpenOffice.org took the refactoring route. I took the rewrite route.

After one year here is where we are.

What has happened:
I started designing and implementing the NOOXML-core in Jan 2010. The magic is the datastructure which allows a compact representation of the documents and fast implementation of insert/deletion operations etc. I also wanted to be able to do real- time-collaboration, which influenced the design of the core a lot. In March 2010 I was able to load the ECMA Spec Part I (very big document) into the core. Not only on a desktop machine, but also on my “iPod” (not “iPad”!!).
Once I had the basic core design and implementation done I started working on the layout engine. The primary goal was to build a fast and reliable layout engine. In my implementation I focused on OfficeOpenXML fidelity. In August I had the basic layout features like text, headers, footers, tables, footnotes etc. done. I was able to render the ECMA Spec Part I (again: very big document; >5000 pages) to PDF. I then added section and multiple column support.
Yesterday I was able to render the ECMA Spec Part I document on the iPod (real device) AND in the Android emulator (since I don't have an Android device) and without a user interface:

(I know: I took a really long time. But there is sooooo much room for improvements. And hey: OOo can't even load it on a desktop-machine.)


And here is the UI-less port for Android 2.3:


Happy new year!