[rfc-i] issue: canonical formats

Joe Hildebrand jhildebr at cisco.com
Mon Jun 4 08:17:56 PDT 2012


On 6/4/12 2:50 AM, "Yoav Nir" <ynir at checkpoint.com> wrote:

> I strongly prefer the XML2RFC. Any XML that you write and pass through xml2rfc
> comes out looking like an RFC, including correct section numbering, the TOC,
> etc.

All of that can be added to the HTML by a tool that performs the same
function as xml2rfc.

> Most HTML in the world doesn't look like an RFC. So if we allow HTML as the
> submission format, we need tools that reject or fix anything that doesn't look
> like an RFC, or else have the human RFC editors reformat all the text (and
> pretty soon, images as well). Writing such tools is akin to writing anti-virus
> software - I-D writers can come up with new ways of creating things that don't
> look like RFCs faster than the tool can adapt.

First: I think we can start with the assumption that authors are trying to
create conformant docs.  If we detect that certain authors are acting with
bad intent there are likely actions we can take to discourage that behavior.

Second: A whitelist approach with a very small subset of HTML isn't too hard
to check.  Are there any tags which are not on the allowed list?  Are all
elements nested inside a relevant parent?  Are there any inline style
attributes?  Are all images above some minimum size (64x64?)?  Do all images
have at least a couple of pixels that are different?  Those rules get us
pretty close.

That said, I don't care if people use xml2rfc to generate good HTML.  I
don't mind if xml2rfc is a parallel submission format to HTML, and I could
be talked in to xml2rfc being the only submission format - although I
believe we would eventually want to replace it with HTML if we made that
choice.

> The tools available for editing HTML don't make this any better. They might
> work with DIVs or tables or the infamous small white gif to create structure
> in documents, and they also tend to create very large files with a lot of
> embedded formatting, and that makes it harder later on to convert these to
> usable other formats such as PDF.

Don't use FrontPage or Word to edit the HTML if you want it to be
submittable.  There are a growing number of nice HTML tools that allow you
to maintain the structure and semantics that you desire in your HTML; site
authors tend to care much more about that in recent years than they did at
the birth of the web.

-- 
Joe Hildebrand



More information about the rfc-interest mailing list