[rfc-i] New proposal for "canonical and others"

Iljitsch van Beijnum iljitsch at muada.com
Sat Jun 16 10:40:11 PDT 2012

On 16 Jun 2012, at 0:21 , Paul Hoffman wrote:

> For this draft, I chose XML instead of HTML for a variety of reasons. Basically, there are too many things that an HTML-aware editing program might do that would not match the constrained format that RFC Editor will surely require, whereas a similar (as yet non-existant) XML-aware editing program probably would not.

I disagree with this and pretty much every single statement and conclusion in the draft.

If you want I can go over the details but I won't do that now, but rather, focus on the big ticket items.

First of all, obviously if we come up with a new format rather than adopting an existing one, no existing tools will generate that format. And as I explained before, the way programs like Word output to HTML has nothing to do with creating the kind of HTML that has been under discussion here the past months. Word processors are capable of handling the structure we need in a new format. As far as I know, they are not capable of exporting those semantics in the form of HTML. But this is something that conversion tools can be created for.

I don't know about the difficulty of editing XML2RFC-like XML in an XML editor. Maybe they can do it, maybe not.

The trouble with making XML the canonical format is that humans can't read XML. If there is a subtle conversion problem that makes the text version of an RFC say something that makes implementations derived from it uninteroperable with implementations derived from the PDF version, or a lawsuit hangs on which one represents the official RFC, then going back to the XML isn't going to solve this. It will always be necessary to transform XML into something that can be displayed such that humans can read it. And building in a dependence on tools makes it impossible to meet our longevity goals, because the environments that are necessary to run code change way too fast.

Also, making the XML the canonical version means it becomes impossible to modify the XML format more than once every decade or so. That means the XML format must be really, really, really good the first day we start using it, because any and all issues will stick around for many years. Note that this is much less of an issue with pure output formats. Maybe version 1 can only indent using non-breakable spaces. Version 2 supports tabs. But on paper or the screen it's both empty space so it doesn't matter. But pouring a 2012 XML format in concrete means that in the future, we'll be just as annoyed with the limitations of that format as we are today with the limitations of formatted ASCII text.

More information about the rfc-interest mailing list