What’s in a file format?

There has been some talk about file formats among our users and users of other word processors lately. I’ve seen a lot of explanations about what the different formats are and their relative benefits. Since we know a little about this stuff, I thought a little explanation of the major formats might help the conversation. So here it is, my run down of the different word processor formats and their relative strengths:

Microsoft Word (.doc)

Most Microsoft Word documents are stored in a binary (not human readable) format. The standard for this format is kept secret by Microsoft to keep users using Word. This format is important, of course, because it is the native format for the world’s most commonly used word processor. Because it is binary, it is difficult to reverse engineer, though it has been done. Nisus is able to read documents in this binary format thanks to our partnership with the fine folks building Abiword.

Sometimes, .doc files will actually contain RTF instead of this binary format. This is a valid way to save .doc files and its even possible to do this in Microsoft Word. When we save a .doc file in Nisus Writer, we use this technique because RTF has some advantages over this binary format.

Rich Text Format (.rtf)

First created by Microsoft to allow users to share documents, RTF is the most widely supported word processing format on the planet. The format is text based and the standard is openly published by Microsoft. Though this standard is maintained by one company, it can never effectively be “closed” because it is so widely supported.

The RTF format is kept up to date with the binary Word format so almost everything that can be saved in a binary Word document can also be saved in RTF. Unlike the binary format, which is quite rigid, RTF can be extended by Microsoft, Nisus or anyone else to support any additional features you might want without breaking your document. This is a great thing since it allows Nisus to store unique features while remaining compatible with Microsoft Word.

The RTF format is very old and each new version of the format remains compatible with all the older versions. This means the format is somewhat difficult for developers to support completely. We think it is worth this substantial investment, however, to ensure documents written in Nisus Writer can be used with all other word processors available.

HTML

This is the format of the web. It is open, standardized, and the newest version (XHTML) is extendable. Personally, I think this will become the word processing standard at some point in the future.

OASIS OpenDocument

This is a new open standard for word processing documents just published by OASIS. It is in XML and currently it is supported primarily in OpenOffice. I think this format will be mostly useful for people developing publishing systems. This is a completely separate format from the Microsoft Word binary format or RTF.

AppleWorks, Word Perfect, Nisus Writer Classic

There was a day when every word processor had its own format. The reason developers did this was to “lock” you in to using their software. The idea was that once you had all your documents stored in their proprietary format, you would have to keep buying the same software to read your documents.

In turns out that this approach works, but only for Microsoft. In a day when the world has de facto standardized on the Word and RTF formats, however, all these other proprietary formats just make the alternative word processors less useful. After all, now you can’t share your writing with anyone else!

We think the right thing to do in this case is to help users to read these formats and convert them to something more widely supported like RTF. This way you are not locked in to any one word processor, including Nisus Writer. Your writing is protected and you have the freedom to use whatever tool fits your needs best.

Anything XML

Something really important to understand about XML: XML is to file formats what grammar is to language. To have a full format, you also need a vocabulary. Anytime someone creates a new format that is based on XML, their format still cannot be understood by other software unless that software also supports their proprietary “vocabulary.” XML makes it easier to support these new formats, but it still a lot of work and you will have to convince others it is worth their time to do this to make a format become a new standard.

We get requests now and then to support “XML” in our application. We usually respond by asking “what kind of XML”? Most of the time we don’t get a good answer back. The truth is, we are looking at XML very closely and we will probably use this technology at some point. But when it comes to actually exchanging documents with other people in the real world, RTF does a much better job today.

Well, there you have it. My short run down of the file formats in popular use. I hope this helps!

5 Comments

  1. Paul

    When you say “We think the right thing to do in this case is to help users to read these formats and convert them to something more widely supported like RTF,” does this mean that you are planning to add translators to allow importing files from these other programs (AppleWorks, etc.)? That would overcome the biggest single inconvenience for me with Nisus Writer Express. I still have lots of old documents in AppleWorks format, and it would save me a couple of steps for each one if I could open these old .cwk documents directly in NWX — saving hours of time if you think about doing this for more than 500 .cwk files…

    (And for what it’s worth, I appreciate the use of RTF as Nisus’ default format; this has made it much easier to share files with my PC-using friends and colleagues than when I was using AppleWorks!)

    Paul

  2. Hi Paul:

    We have looked at a number of options to handle conversion of AppleWorks documents, but so far we have not found a solution that is workable for the number of people who need this solution. Until then, I highly recommend you get MacLinkPlus to convert your Appleworks documents en masse to RTF. Then you can use your Appleworks docs in any word processor you want.

  3. leif

    Why can’t you do a better job opening your own old Nisus classic files? Is that so hard? It is your own format! Especially files with images are troublesome. I’ve found the best thing is to use icWord to open old Nisus files, and then to save them as rtf.

  4. Paulus

    I think it should be mentioned that HTML or even XHTML are not very useful as word-processing file formats unless you add some kind of styling ability such as CSS.

  5. Amar Kout

    Hi, I have 2 questions:
    1- What do I have to do to make nisus 6.5 read documents from a windows based PC running microsoft word?
    2- I am running nisus 6.5 on osx 10.4 through the os9 inviroment, should I upgade to nisus express? From the screen shots I think the 6.5 is better.
    E-Mail me plz, Thanx a bunch.

Leave a Reply

Your email address will not be published. Required fields are marked *