Friday, June 26, 2009

API Design Matters

I was reading a very interesting article called "API Design Matters" with the subtitle "Bad application programming interfaces plague software engineering. How can we get things right?". Very cool stuff.

In OpenOffice.org we have an API "plague" too: The ODF import/export is based on the "UNO-API" and so is the OOXML import for Writer. And developers hate these APIs.

So the question is why do developers hate the "UNO-API"? And the obvious --- but wrong answer --- is: "I hate the UNO-API because of UNO". Don't get me wrong here: This is neither about pro or contra UNO. But the statement that "UNO is the problem of the ODF import/export and OOXML import problems" is wrong. It's not UNO per se, but its the design of the API.
[In case you're wondering what "UNO" is: UNO=COM ;-) So UNO is OpenOffice.org's way of COM.]

And just to be sure I do not offend the wrong people: The UNO-API was not designed to be used in the import/export filters. It was designed to be the API for "OpenOffice.org BASIC" developers, i.e. it was designed to provide a similar API to what VBA developers have in Microsoft Office. It was never designed to be used for import/export filters.

The problem was the decision to base the import/export code on such a high-level API! And we suffer from this decision until now!

Anyway. How can we fix this?
a) We claim the current API is the best mankind can do and print T-Shirts with 1000 years of OOo experience.
b) We claim UNO and abstraction is the problem and use the internal legacy APIs, so that we never get a chance to refactor the internal legacy stuff since we're creating even more dependencies.
c) We come up with a better API.

Option a) was demonstrated at the OpenOffice.org conference in Beijing. [Does anybody have a picture of the T-Shirt?]

Option b) is the straightforward approach. E.g. in Writer the “.DOC”, “.RTF”, .”HTML” filters are based on the internal “Core” APIs. So lets use these APIs instead of the UNO-APIs.
Whats wrong with the approach? The problem is that these internal APIs do not abstract from the underlying implementation at all. Repeat: The internal APIs do not abstract from the underlying implementation at all.
Does this answer the question why using the internal APIs is the wrong approach? Obviously *not* having an abstraction between your core implementation details and your import/exports filters is ... [offensive language detected ;-)].

Option c) only has one problem: How should the API look like?

I have some ideas here, but before posting them maybe there are some strong believes out there?

Tuesday, April 15, 2008

Finally we had a developer conference! The good thing is that it was real fun. The bad thing was that I learned and drank toooooo much....

There are some dicussions I'd love to share with you:

  • Bug handling. Had some interresting chats about bug handling, responsiveness etc. from a developers point of view. Especially from a filter developers point of view. My believe is that we need a better clustering of bugs into problematic areas. This definetly will help to manage espectations as well as quality.
  • Mail merge. Learned that mail merge is not only broken IMHO but also in the opinion of others. Good (or bad ?:-)). However great things will happen here.
  • UI. Very good ideas about how to change the UI. Thanks Ricardo that was a great session.
  • Interop brokeness. Discussed my ideas about how to change ODF and OOo for better interop. Always good to get your ideas “blessed” by the master himself. Thanks Caolan...
  • Some chats about what to do with http://www.go-oo.org and how to attract more developer. Wait until my VM will appear... ;-)


Beside from the above some interresting news regarding OOXML/ODF/ISO arose. The report from the ISO meeting in Oslo sounds very promising IMHO:

<quote>
SC 34 envisages the creation of three distinct working groups that meet the needs of:
1. ISO/IEC 29500
2. ISO/IEC 26300
3. Work on interoperability/harmonization between document format standards
and wishes to incorporate existing expertise on these standards.
</quote;>


Only trouble here is that the ODF people do *not* seem to be happy about that --- but I have no idea why?

Overall it was a great week:





~Florian

Tuesday, February 05, 2008

"XML Namespaces are designed to support exactly this kind of thing." (Tim Bray)


We make really good progress on our interoperability work. In our current focus area of fields we extended the OpenOffice.org Writer core for better support of MS Word-like fields. The first feature which benefits from this work are “Input fields” which now support the long wanted "tabbing" feature.

However we want all fields to benefit from the new enhanced field core --- not only "Input fields". Other areas are e.g "Mail merge fields" etc.. Since all of this fields share the same generic mechanism we decided to add support for this generic MS Word-like fields in OpenOffice.org Writer. But by doing so we faced the problem that ODF is not supporting these kind of fields.

Interestingly Tim Bray (Director of Web Technologies at Sun Microsystems) suggested a solution already in November 2005: http://www.tbray.org/ongoing/When/200x/2005/11/27/Office-XML. Unsurprisingly he suggested XML namespaces to solve this problem.

Thats what we did. MS Word-like fields are now stored in the namespace

xmlns:field="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0"

which clearly indicates the purpose: OOXML<->ODF interoperability.

The following RelaxNG fragment enhanced the current ODF specification with the new fields:
<define name="paragraph-content" combine="choice">
<choice>
<element name="field:fieldmark">
<attribute name="text:name">
<ref name="string"/>
</attribute>
<attribute name="field:type">
<ref name="namespacedToken"/>
</attribute>
<attribute name="field:locked">
<ref name="boolean"/>
</attribute>
<sequence>
<ref name="fieldmark-parameter"/>
<zeroOrMore>
<ref name="paragraph-content"/>
</zeroOrMore>
<sequence>
</element>
<element name="field:fieldmark-start">
<attribute name="text:name">
<ref name="string"/>
</attribute>
<attribute name="field:type">
<ref name="namespacedToken"/>
</attribute>
<attribute name="field:locked">
<ref name="boolean"/>
</attribute>
<ref name="fieldmark-parameter"/>
</element>
<element name="text:fieldmark-end">
</element>
</choice>
</define>


In general fieldmarks are very similar to bookmarks, except that they need to be properly nested. This is achieved by the fact, that a field:fieldmark-end does not have a "name" attribute, but instead closes the last opened field:fieldmark-start element.
The field:fieldmark element is a short form of field:fieldmark-start and field:fieldmark-end. It SHOULD preferably be written instead of start-/end marks.

Every fieldmark can have
  • a name (text:name); similar to the name of text:bookmark elements. They SHOULD be unique. (Preferably also with the bookmark names).
  • a type (field:type) which allows application to define the type of the fieldmark.
  • a sequence of associated (name, value) pair represented by the <field:param field:name=”string” field:value=”string”/>.
  • a locked attribute which specifies whether the user can edit the content or not.


A sample. Lets take a loog at the following sample docs:




The OOXML representation is:
  <w:p>
<w:r><w:t xml:space="preserve">Title: </w:t></w:r>
<w:bookmarkStart w:id="0" w:name="Text1"/>
<w:r>
<w:fldChar w:fldCharType="begin">
<w:ffData>
<w:name w:val="Text1"/>
<w:statusText w:type="text" w:val="Just a sample field."/>
<w:textInput/>
</w:ffData>
</w:fldChar>
<w:instrText xml:space="preserve"> FORMTEXT </w:instrText>
<w:fldChar w:fldCharType="separate"/>
<w:t xml:space="preserve">A sample input.</w:t>
<w:fldChar w:fldCharType="end"/>
</w:r>
<w:bookmarkEnd w:id="0"/>
</w:p>
<w:p>
<w:r><w:t xml:space="preserve">Description: </w:t></w:r>
<w:bookmarkStart w:id="1" w:name="Text2"/>
<w:r w:rsidR="00FA39C2">
<w:fldChar w:fldCharType="begin">
<w:ffData>
<w:name w:val="Text2"/>
<w:statusText w:type="text" w:val="Yet another sample field..."/>
<w:textInput/>
</w:ffData>
</w:fldChar>
<w:instrText xml:space="preserve"> FORMTEXT </w:instrText>
<w:fldChar w:fldCharType="separate"/>
<w:t>A sample input.</w:t>
</w:r>
</w:p>
<w:p>
<w:r><w:t>Second sample input paragraph.</w:t></w:r>
<w:r><w:fldChar w:fldCharType="end"/></w:r>
<w:bookmarkEnd w:id="1"/>
</w:p>
<w:bookmarkStart w:id="2" w:name="Check1"/>
<w:p>
<w:r>
<w:fldChar w:fldCharType="begin">
<w:ffData>
<w:name w:val="Check1"/>
<w:statusText w:type="text" w:val="A sample checkbox..."/>
<w:checkBox>
<w:checked/>
</w:checkBox>
</w:ffData>
</w:fldChar>
<w:instrText xml:space="preserve"> FORMCHECKBOX </w:instrText>
<w:fldChar w:fldCharType="end"/>
</w:r>
<w:bookmarkEnd w:id="2"/>
<w:r><w:t xml:space="preserve"> Make sense?</w:t></w:r>
</w:p>

The ODF+Enhancement representation is:
 
<text:p>Title: <field:fieldmark-start text:name="Text1" field:type="ecma.office-open-xml.field.FORMTEXT"><field:param field:name="Description" field:value="Just a sample field."/></field:fieldmark-start>A sample input.<field:fieldmark-end/></text:p>
<text:p>Description: <field:fieldmark-start text:name="Text2" field:type="ecma.office-open-xml.field.FORMTEXT"><field:param field:name="Description" field:value="Yet another sample field..."/></field:fieldmark-start>A sample input.</text:p>
<text:p>Second sample input paragraph.<field:fieldmark-end/></text:p>
<text:p><field:fieldmark text:name="Check1" field:type="ecma.office-open-xml.field.FORMCHECKBOX"><field:param field:name="Description" field:value="A sample checkbox..."/><field:param field:name="Result" field:value="1"/></field:fieldmark><text:s/>Make sense?</text:p>


Cool isn't it. Or with Tim's words: "Who could possibly be against it?"

Wednesday, January 23, 2008

Never try to catch a train last minute...

Yesterday I tried to catch a train last minute. While running towards it I fell down. I got up again and managed to get it.

While sitting in the train I realized that my arm hurts and at my destination I went into a hospital. The X-rays revealed that my ellbow was broken ;-) Nohting serious --- it'll hopefully heal within two weeks...

So my advice clearly is: Never try to catch a train last minute --- let it pass!

And the moral is: The next train would have departed in only 30 minutes...

Damn!


P.S. In the next two weeks you'll only get short messages from me since I can only type with one hand :-(

Tuesday, December 18, 2007

Back to the binaries! Yeah!

After all this XML work the binary file formats are a different world. For the fields work I needed to analyze the “form field” structure of the binary .DOC format:

The header: Actually a misused PICT structure:

b10

b16

field

Type

size

bitfield

comments

0

0

lcb

U32

Count of bytes of the whole block.

4

4

cbHeader

U16

Always 0x44

6

6

U8[62]

Contains zero. In fact this is the PICT struct, but since its not need we can fill it with zeros.

The formfield payload (Unicode Variant)

b10

b16

Field

Type

size

bitfield

comments

0

0

cUnicodeMarker

U8[32]

Contains {0xFF,0xFF,0xFF,0xFF}

4

4

fftype

U8

:2

03

Type:

0 = Text

1 = Check Box

2 = List

ffres

U8

:5

7C

Result field for a form field. Values from 0 to N-1, where N is the number of \ffl entries.

In case of check boxes: 0==unchecked; 1==checked.

ffownhelp

U8

:1

80

1 if there is associated Help text, 0 otherwise.

5

5

ffownstat

U8

:1

01

1 if there is associated status line text, 0 otherwise.

ffprot

U8

:1

02

1 if this field is protected, 0 otherwise.

ffsize

U8

:1

04

Type of size selected for check box field:

0 = Auto

1 = Exact

fftypetxt

U8

:3

38

Type of text field:

0 = Regular text

1 = Number

2 = Date

3 = Current date

4 = Current time

5 = Calculation

ffrecalc

U8

:1

40

1 if the field should be calculated on exit, 0 otherwise.

ffhaslistbox

U8

:1

80

1 if this field has list box attached to it, 0 otherwise.

6

6

ffmaxlen

U16

:15

7FFF

Number of characters for text field. Zero means unlimited.

U16

:1

8000

Unknown. Set to zero.

8

8

ffhps

U16

Check box size (half-point sizes).

10

A

xstz_ffname

Xstz_UString0

Form field name

xstz_ffddeftext

Xstz_UString0

Default text for field. Only if type==0.

ffdefres

U16

Default resource for list field. Default value for check box (0=default unchecked; 1=default checked). Only if type!=0.

xstz_ffformat

Xstz_UString0

Format for text field

xstz_ffhelptext

Xstz_UString0

Help text

xstz_ffstattext

Xstz_UString0

Status line text

xstz_ffentrymcr

Xstz_UString0

Macro to execute upon entry into this form field

xstz_ffexitmcr

Xstz_UString0

Macro to execute upon exit from this form field

cUnicodeMarker2

U8[2]

Contains {0xFF, 0xFF}; Padding and/or indicator for Unicode?

fflLen

U32

Num of ffls

ffl

Xstz_UString[fflLen]

Resource string for lists.

An Xstz_UString has the following form:

b10

B16

Field

type

size

bitfield

Comments

0

0

Len

U16

Len of the String.

2

2

Unicode char

U16[len]

Unicode chars

An Xstz_UString0 has the following form:

b10

B16

Field

type

size

bitfield

Comments

0

0

len

U16

Len of the String.

2

2

Unicode char

U16[len]

Unicode chars

2+2*len

Zero

U16

Trailing “0”

In case of non-Unicode encoding then the Unicode Marker disappear and the string chars have U8 size.

You might also want to take a look at the ffData element in OOXML ;-)

Tuesday, October 30, 2007

Business applications of unstructured text

Interresting article in the ACM Communications.

A widely touted IT factoid states that
80% of the information produced by
and contained in most organizations
is stored in the form of unstructured
data. Most of it is text (such as memoranda,
internal documents, email,
organizational Web pages, and comments
from customers and from
internal service personnel), and most
of the applications that reflect the
value of unstructured data are able to
process it. Although unstructured
data takes other forms, including
images and audio, here I focus on the
applications, technologies, and architectures
for unstructured text acquisition
and analysis (UTAA).

Monday, October 29, 2007

New OpenOffice.org target.

Many of you probaly know the “WONT FIX” target in the OpenOffice.org issue tracker.

What about introducing a new target: “HELPS MICROSOFT”.

But why do we need this? These days many people --- especially from the file formats camps --- are extremely sensitive of anything related to compatiblity 'cause they believe it helps Microsoft.

So lets give the ODF warriors an opportinity to clearly communicate with the users. Give them the “HELPS MICROSOFT” target to publicly exposing the issuer of the bug and the people working on it.

Thursday, October 25, 2007

Field update --- preview for Windows.

I now have a preview for Windows available at http://download.go-oo.org/preview/oodemo.zip.

Simply download it and unzip it. To start execute soffice.exe in ooo2.3/program/.

Same features as the Linux Version. So no saving at this point.

And don't forget to give feedback :-)

Thanks,

~Florian