[rfc-i] Proper use of word processors

Iljitsch van Beijnum iljitsch at muada.com
Tue May 29 02:18:23 PDT 2012

(I'm a bit behind due to moving and not having an internet connection. And the power went out this morning, too...)

I read the discussion about Word and how it exports to HTML. Like pretty much all other programs that can export to HTML, but aren't true HTML editors, it just uses whatever it can to get the output it needs, without regard for the semantic structure of the HTML. And why not? All that counts is what the user sees on their screen or paper. Whether that's triggered by a <li>... or a <p>o ... doesn't matter in the grand scheme of things.

However, for our purposes it is useful to have clean markup, because contrary to what some people think should be the case, we often edit our document code by hand. And until there are really good tools that span the variety of world views among the IETF constituency, I don't see the need to edit the document code by hand disappearing. So making that easy is important. Which means that we'll have to forego or at least make optional things that would make the document format nicer for tools if that implies more work for toolless authors.

Back to word processors. Although you wouldn't know it from their HTML, word processors do know a lot about document structure, because they let you tag paragraphs and/or strings of text within a paragraph with different styles. This maps very well to the HTML/CSS model like so:

<p class="bodytext">Any implementation of the blah v1.1 protocol <span class="rfc2191must">must</span> also implement the blah v1.0 protocol.</p>

Now obviously some conversion is required between the formats supported by these word processors and our new HTML format, but that should be eminently doable for the text part. (If we can make a tool that does this for the Word '97 .doc format we should be in good shape, current word processors can all read and write that.)

That leaves the front matter with all the metadata and the references, but fortunately those only make up a small portion of the document so if those are handled in a less elegant way that would be ok.

More information about the rfc-interest mailing list