[rfc-i] RFC editing tools

Paul Hoffman paul.hoffman at vpnc.org
Fri Dec 7 11:52:01 PST 2012


On Dec 7, 2012, at 11:38 AM, Ted Lemon <mellon at fugue.com> wrote:

> If the representational form, which you seem to think should be HTML, has to be processed by the RFC editor into a different HTML file in order to be useable, then there is no benefit to using HTML.   You could use any parseable representation format and get the same output.

If you believe the wide availability of text editors that understand HTML and flag/correct mistakes as you enter your text, and the wide availability of HTML viewers (better known as "browsers") to preview your text as you enter it, and the wide availability of parsing tools in all common languages, are of no benefit, then you are right. Others find those three features to be attractive.

> That being the case, why not choose a representation that's easy to parse, instead of the proposed HTML representation, which will be difficult to parse?

Good question.

> To be clear, what I mean by "easy to parse" is that if the representation is some form of XML, the parser just has to read in the XML, and it has a tree structure containing the document, which can be processed and spat out.   The parsing process is trivial.

Others who have tried it in tools would disagree with that last bit.

> The format Joe has proposed is "hard to parse" because the parser first has to read in HTML, not XML; HTML itself is hard to parse because some tags (e.g., <br>) do not have to be terminated.   

So, all the well-documented parsers out there are failures? Seems kinda unlikely.

> But then there's the further complexity that once the HTML has been successfully transformed into a tree structure, the parser has to groom the tree structure, examining each element to see if it requires special parsing of the enclosed text and then, if so, doing that special parsing, for which there is no grammar—it's just free-form text.   Once this is done, we now have a tree structure containing the document which can be processed and spat out in a different form.

...and this is identical to XML, yes?

> So I think it's simply wrong to claim that the HTML format Joe proposes is a representational format.   It is a presentational format, and not a bad one.   But making it the canonical format for RFCs means that we lose the benefit of a good representational format: that it can be easily transformed for multiple different uses.

I'm glad you think that our XML profile is a "good representational format": I would call it an adequate one, and Joe's a bit more adequate, but not good either.

> What Joe's draft has started to document is what an RFC should look like when it's presented for viewing in a browser, not what the canonical format of an RFC should be.

It's kind of rude for you to state what Joe meant, given that he has said the opposite.

--Paul Hoffman


More information about the rfc-interest mailing list