Tuesday, December 18, 2007

Back to the binaries! Yeah!

After all this XML work the binary file formats are a different world. For the fields work I needed to analyze the “form field” structure of the binary .DOC format:

The header: Actually a misused PICT structure:

b10

b16

field

Type

size

bitfield

comments

0

0

lcb

U32

Count of bytes of the whole block.

4

4

cbHeader

U16

Always 0x44

6

6

U8[62]

Contains zero. In fact this is the PICT struct, but since its not need we can fill it with zeros.

The formfield payload (Unicode Variant)

b10

b16

Field

Type

size

bitfield

comments

0

0

cUnicodeMarker

U8[32]

Contains {0xFF,0xFF,0xFF,0xFF}

4

4

fftype

U8

:2

03

Type:

0 = Text

1 = Check Box

2 = List

ffres

U8

:5

7C

Result field for a form field. Values from 0 to N-1, where N is the number of \ffl entries.

In case of check boxes: 0==unchecked; 1==checked.

ffownhelp

U8

:1

80

1 if there is associated Help text, 0 otherwise.

5

5

ffownstat

U8

:1

01

1 if there is associated status line text, 0 otherwise.

ffprot

U8

:1

02

1 if this field is protected, 0 otherwise.

ffsize

U8

:1

04

Type of size selected for check box field:

0 = Auto

1 = Exact

fftypetxt

U8

:3

38

Type of text field:

0 = Regular text

1 = Number

2 = Date

3 = Current date

4 = Current time

5 = Calculation

ffrecalc

U8

:1

40

1 if the field should be calculated on exit, 0 otherwise.

ffhaslistbox

U8

:1

80

1 if this field has list box attached to it, 0 otherwise.

6

6

ffmaxlen

U16

:15

7FFF

Number of characters for text field. Zero means unlimited.

U16

:1

8000

Unknown. Set to zero.

8

8

ffhps

U16

Check box size (half-point sizes).

10

A

xstz_ffname

Xstz_UString0

Form field name

xstz_ffddeftext

Xstz_UString0

Default text for field. Only if type==0.

ffdefres

U16

Default resource for list field. Default value for check box (0=default unchecked; 1=default checked). Only if type!=0.

xstz_ffformat

Xstz_UString0

Format for text field

xstz_ffhelptext

Xstz_UString0

Help text

xstz_ffstattext

Xstz_UString0

Status line text

xstz_ffentrymcr

Xstz_UString0

Macro to execute upon entry into this form field

xstz_ffexitmcr

Xstz_UString0

Macro to execute upon exit from this form field

cUnicodeMarker2

U8[2]

Contains {0xFF, 0xFF}; Padding and/or indicator for Unicode?

fflLen

U32

Num of ffls

ffl

Xstz_UString[fflLen]

Resource string for lists.

An Xstz_UString has the following form:

b10

B16

Field

type

size

bitfield

Comments

0

0

Len

U16

Len of the String.

2

2

Unicode char

U16[len]

Unicode chars

An Xstz_UString0 has the following form:

b10

B16

Field

type

size

bitfield

Comments

0

0

len

U16

Len of the String.

2

2

Unicode char

U16[len]

Unicode chars

2+2*len

Zero

U16

Trailing “0”

In case of non-Unicode encoding then the Unicode Marker disappear and the string chars have U8 size.

You might also want to take a look at the ffData element in OOXML ;-)

Tuesday, October 30, 2007

Business applications of unstructured text

Interresting article in the ACM Communications.

A widely touted IT factoid states that
80% of the information produced by
and contained in most organizations
is stored in the form of unstructured
data. Most of it is text (such as memoranda,
internal documents, email,
organizational Web pages, and comments
from customers and from
internal service personnel), and most
of the applications that reflect the
value of unstructured data are able to
process it. Although unstructured
data takes other forms, including
images and audio, here I focus on the
applications, technologies, and architectures
for unstructured text acquisition
and analysis (UTAA).

Monday, October 29, 2007

New OpenOffice.org target.

Many of you probaly know the “WONT FIX” target in the OpenOffice.org issue tracker.

What about introducing a new target: “HELPS MICROSOFT”.

But why do we need this? These days many people --- especially from the file formats camps --- are extremely sensitive of anything related to compatiblity 'cause they believe it helps Microsoft.

So lets give the ODF warriors an opportinity to clearly communicate with the users. Give them the “HELPS MICROSOFT” target to publicly exposing the issuer of the bug and the people working on it.

Thursday, October 25, 2007

Field update --- preview for Windows.

I now have a preview for Windows available at http://download.go-oo.org/preview/oodemo.zip.

Simply download it and unzip it. To start execute soffice.exe in ooo2.3/program/.

Same features as the Linux Version. So no saving at this point.

And don't forget to give feedback :-)

Thanks,

~Florian

Wednesday, October 24, 2007

IBM's Symphony.

Downloaded IBM's Symphony today to follow up on some of the problems discussed at the ODF Interop Camp. (Btw. its sad that the ODF Camp people want to treat the problems as confidential.).

So back to Symphony. Why the hell did they crippled all the cool OpenOffice.org easter eggs?

So why is =game("StarWars") crippled?

And look what they done to the lovely picture of the Calc team:


I think that contradicts the SISSL :-)

Monday, October 15, 2007

Update on field work --- Early preview available for Linux.

In a previous post I talked about my field-proof-of-concept. I continued to work on the issue and I'm happy to give an update on that front.

You can download a preview version of my work here:

http://download.go-oo.org/preview/oodemo.tgz

(Linux only. Just untar the archive tar xzf oodemo.tgz and then cd oodemo/program and start ./soffice). This is a preview version. Do not use it for productive work! The preview demo shows

  • the core enhancements (tabbing!), and

  • .DOC import.



The work for .DOC export, ODF import/export is not done and not included in the demo.

For testing you can download the sample file formc1new.doc which is taken from issue 79720. It should look like this:


Again --- this is work in progress. So do not expect everything to work. However if you have issues please let me know. And remember “saving does not work yet :-)”.

I really hope I get some feedback,

~Florian

P.S.
I will make the patch available ASAP. It is the result of some weekend hacking --- it really needs some polishing first.

Sunday, September 23, 2007

Back from the OpenOffice.org Conference 2007 in Barcelona

Good to meet people in person.

Talked at lot about

  • Harmonization between ODF and OOXML,
  • Trade-off between Standardization and Innovation and
  • Interoperability wrt. ODF<->ODF and ODT<->.DOC

Some pics from the conference:

(Thanks Peter for great evening.)

(Thanks to the people at the ODF Camp)

(Thanks to Kohei, Hubert and Noel for the great time at the XXVII MOSTRADA DE VINS I CAVES DE CATALUNYA)

I hope I can find some time to go into more details.

Wednesday, September 12, 2007

Office 2.0 conference

I'm just back from the Office 2.0 conference in San Francisco. I was participating in a panel discussion about Document Formats:



Very nice crowd the Office 2.0 guys. They really taught me to think more about collaboration.

Many thanks for the nice presentations. (And for the insight that most of you use OpenOffice.org to convert between HTML and the other file formats ;-))

Ahh -- and almost forget. I will look into OpenSAM. Promised. Sounds really like a good idea for OpenOffice.org supporting it.

Monday, August 13, 2007

Status of my Suggested enhancements for OpenDocument V1.2:


Hi Thomas,

thanks for the question. Here is the status:



























Tables:
* introduce allowCollapse attribute for paragraphs following nested tables to encode WW and HTML-like tables.Not put up for discussion.
* declare sub tables as deprecatedUnder discussion in the Accessibility SC.
Numbering
* introduce text:level-text attribute to encode arbitrary number formatsRejected.
* introduce text:num-follow-char to encode WW-like numberingPartly accepted.
* introduce text:list-override to encode WW-like numberingStrongly rejected.
* declare style:list-level-properties/@text:space-before as deprecated. Effect can be achieved with paragraph indent.Rejected.
Master-page styles
* add header-first and footer-first to encode WW-like page-stylesNot put up for discussion
* modify master-page styles such that WW-like sections can be encoded; current CSS3.0 like text:sections are not applicableNot put up for discussion
* declare the style:next-style-name attribute of master-page declarations as deprecated.Not put up for discussion
Styles:
* allow deriving paragraph-family styles from text-family styles.Not put up for discussion
"Break chars"
* introduce a command and a command similar to the commandNot put up for discussion
Fields:
* enhance field support by introducing a <text:field-start/> and a <text:field-end/> element to which metadata can be attached.Rejected
Change tracking:
* introduce change tracking for tablesNot put up for discussion
* introduce change tracking on property levelNot put up for discussion
Discourage the use of the following OD features for MOOX interop:
* nested frames Not put up for discussion / Internally communicated as rejected.
* current CSS3.0 like text:sectionsNot put up for discussion / Internally communicated as rejected.
* use fo:break-before instead of fo:break-afterNot put up for discussion / Internally communicated as rejected.
* use fo:margin-* for tablesNot put up for discussion / Internally communicated as rejected.

In general I must confess the OpenDocument TC didn't picked up my discsussion topics... (It's listed as suggested but never has been put for discussion into the agenda). Additionally I had a lot of private communiation where my ideas where communicated as unwanted/rejected.

To get an idea of whats discussed for ODF1.2 take a look at:

  1. Proposals under discussion

  2. Proposals for consideration for a vote in the next coordination call

  3. Approved Proposals

  4. Proposal integrated into the specification document

Wednesday, July 18, 2007

Field enhancement proof-of-concept finished.

I've been working on field enhancement for OpenOffice.org Writer for quite a while and today I finished my proof-of-concept hacking:

OpenOffice.org Writer has a lot of shortcommings wrt. to fields which I tried to address:


In my proof-of-concept I was able to enhance the Writer core such that these issues are addressed. (That's the good news!)

Unfortionately my proof-of-concept still needs a lot of love. First thing is to clean up the prototype and generate patches for ooo-build.

However I'm happy since this is my first major work on the OpenOffice.org Writer layout and the field support is an issue in OpenOffice.org Writer for quite a while...

Friday, July 13, 2007

"The first casualty of War is Truth"
Reading some blogs about the ODF/OOXML file format war the famous quote "The first casualty of War is Truth" (from Rudyard Kipling --- I guess) comes into my mind.

Thursday, June 28, 2007

XEMBED, Mono and OpenOffice.org

In my last post I talked about the hack I did to get some Java applets running in an OpenOffice.org docking window.

I played a little more with the code and managed to get a XEMBED socket running in an OpenOffice.org docking window. The picture below shows an OpenOffice.org running with a XEMBED ready docking window:



The title bar of the docking window shows the socket id to which XEMBED applications can connect.

I used the following Mono code to connect to the XEMBED socket:

using System;
using Gtk;

// Compile with:
// mcs -pkg:gtk-sharp SamplePlug.cs

public class SamplePlug
{

public static void Main(string[] args) {
if (args.Length != 1) {
Console.WriteLine("Need socket id as an argument.");
return;
}
uint socket_id = UInt32.Parse(args[0]);

Console.WriteLine("using socket "+socket_id);

Application.Init();

Plug plug= new Plug(socket_id);
// plug.Add(new Label("HELLO"));
plug.Add(new Entry("HELLO"));
plug.ShowAll();

Console.WriteLine("running..");
Application.Run();
}
}


The picture below shows it all running:


In theory this'll work not only with Mono but with any application which can talk the XEMBED protocol like e.g. GTK- and QT-based applications.