[rfc-i] transition plan for choosing alternative format for RFCs

Patrick Linskey (plinskey) plinskey at cisco.com
Mon Mar 26 06:25:09 PDT 2012


Leonard said:

> >The odds of convincing vanilla Word to output reasonable
> >semantically-relevant machine-readable output seem comically small.
> >
> I would agree with that sentence because of the addition of the word
> "vanilla" before Word.
> 
> HOWEVER, the use of plugins to MSWord to provide improved output
> capabilities is well established and has been part of the
> workflow/ecosystem of many for at least a decade or more.  Is the
> addition
> of a plugin to Word any more/less reasonable than a completely
separate
> tool (eg. xml2rfc)??

I'd say "yes".

I've written a number of Word add-ins, and writing one that constrains a
document to the subset of Word that we want (or to apply IETF semantics
to Word files and extract, say, an XML document from the word file)
would be non-trivial. And that'd only buy us coverage on Word for
Windows -- the add-in model is different for Mac. 

More generally, I'd argue that we should agree on data formats, not
tooling. 

> I have no specific preference for or against the use of the .doc or
> .docx
> formats, but I would like to refute your position below on pure
> technical
> grounds.
> 
> 1 - Both .doc and .docx are human-editable, obviously

Sorry... I should have been more specific. "Reasonably human-editable
with common-issue tools." As someone who has done a surprising amount of
direct and programmatic editing of .docx files, I'm confident that .docx
fails this test. I've never manually edited a .doc file, but I can't
imagine it'd be easier than doing the same with .docx, and would guess
it's a whole lot harder.  

> 2 - .docx is an XML-based format with rich semantics and transformable
> to other formats using standard tooling

As long as we're diving into details, note that .docx is actually a
zip-based format that contains some XML documents. Sadly, the semantics
are mostly relevant as an export format for the internal Word data
structures, and thus more akin to presentation semantics than anything
else. Again, I should have been more specific in my original email: when
I said "semantically-relevant", I really meant "IETF RFC-relevant
semantics."

It would be difficult, for example, to write a tool that could extract
the workgroup from a given RFC, if we were to use .docx as our file
format. One could imagine requiring certain custom properties to be set,
possibly via an add-in for Windows users. But that seems to take us
further from an easy interoperable data format than we are today.

> 3 - Both .doc and .docx are single file formats
> 4 - Both .doc and .docx support Unicode

Agreed.

> 5 - Both .doc and .docx work just fine with all standard RCS's

Once again, I must apologize for my brevity in my original email. Given
that they are lists of bits, both .doc and .docx can be put into
revision control systems. However, given that they are both binary
formats, some of the common things that we like to do with revision
control systems cannot be reasonably accomplished in typical RCSs ("git
diff" and "git annotate" come to mind in particular).

-Patrick


More information about the rfc-interest mailing list