[rfc-i] RFC editing tools

Julian Reschke julian.reschke at gmx.de
Sat Dec 8 01:44:07 PST 2012


On 2012-12-07 20:38, Ted Lemon wrote:
> ...
> To be clear, what I mean by "easy to parse" is that if the representation is some form of XML, the parser just has to read in the XML, and it has a tree structure containing the document, which can be processed and spat out.   The parsing process is trivial.
>
> The format Joe has proposed is "hard to parse" because the parser first has to read in HTML, not XML; HTML itself is hard to parse because some tags (e.g., <br>) do not have to be terminated.   But then there's the further complexity that once the HTML has been successfully transformed into a tree structure, the parser has to groom the tree structure, examining each element to see if it requires special parsing of the enclosed text and then, if so, doing that special parsing, for which there is no grammar—it's just free-form text.   Once this is done, we now have a tree structure containing the document which can be processed and spat out in a different form.
>
> So I think it's simply wrong to claim that the HTML format Joe proposes is a representational format.   It is a presentational format, and not a bad one.   But making it the canonical format for RFCs means that we lose the benefit of a good representational format: that it can be easily transformed for multiple different uses.
>
> What Joe's draft has started to document is what an RFC should look like when it's presented for viewing in a browser, not what the canonical format of an RFC should be.
> ...

You can parse "tag-soup" HTML with an HTML5 parser, such as Henri 
Sivonen's (which emits a series of SAX events, and thus can be plugged 
into XML tool chains).

Or you can just decide to use XHTML in the first place, in which case 
the files *are* XML.

Best regards, Julian

(Just stating this, I'd prefer an evolution of the xml2rfc format as well)


More information about the rfc-interest mailing list