[rfc-i] A proposal: HTML

Iljitsch van Beijnum iljitsch at muada.com
Thu May 24 13:16:02 PDT 2012

It seems to me that the discussion is veering off in unproductive directions. So I'll explain the HTML format I have in mind and why I think HTML is a good direction, better than any of the alternatives.

Note that I'm not an HTML expert, there are probably better ways to do many of the things that I'm suggesting here.

A big problem today is that the draft/RFC format can't easily be generated by widely used tools, and the tool(s) that _can_ generate it are not exactly user friendly. There is no magic bullet that will make that problem disappear completely, but HTML gets us at least a good part of the way there. So let's discuss that part first.


Writing a draft/RFC has three main parts:

1. The front matter, with author names, publication dates, titles, etc
2. The main body text and headings
3. The references

At this point, I'm going to refrain from complaining about how XML2RFC handles part 1, but rather assume that there is no reasonable way to make this completely painless, and suggest that although the details can be improved, we should probably stick close to how XML2RFC handles this. That could look as follows:

<p class="hiddenfrontmatter">
  <form name="rfcattributes">
    <input type="hidden" name="toc" value="yes">
    <input type="hidden" name="ipr" value="trust200902">
    <input type="hidden" name="docname" value="draft-ietf-behave-ftp64-12">
    <input type="hidden" name="category" value="std">

  <form name="author1">
    <input type="hidden" name="surname" value="Van Beijnum">
    <input type="hidden" name="fullname="Iljitsch van Beijnum">
    <input type="hidden" name="shortfullname="I. van Beijnum">
    <input type="hidden" name="altfullname="Ильи́ч van Beijnum">
    <input type="hidden" name="organization" vaule="Institute IMDEA Networks">


Part 2, the main text. The problem with XML2RFC is that it heavily relies on nested <t> elements. This is unlike HTML or the way in which word processors work, where the heading levels are not automatically deduced from the explicit nesting, but rather the heading level is explicit. So converting back and forth between HTML and word processor formats is much more convenient and less likely to lead to loss of information. Also, the HTML way of doing this is much easier on humans, because you don't have to hunt through the entire document for missing </t> elements (yes, I've missed submission deadlines for this reason).

An example:

<h2>Stateless EPRT translation</h2>

If the address specified in the EPRT command is the client's IPv6 address, then the FTP ALG reformats the EPRT command into a PORT command with the IPv4 address that maps to the client's IPv6 address. The port number must be preserved for compatibility with stateless translators.

(Note that the section numbering can be made automatic using CSS but it may be useful to make this explicit/hardcoded during the RFC publication process.)

Then there is part 3. This is unbelievably bad in XML2RFC. In order to refer to an RFC, which is by far the easiest type of reference, I need to have incomprehensible incantations at the top and the bottom of my document:

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
  <!ENTITY rfc959 PUBLIC '' "http://xml.resource.org/public/rfc/bibxml/reference.RFC.0959.xml">

<references title="Normative References">

so I get to use these in my text: <xref target="RFC0959" />. I guess the decision on whether the leading zero should or shouldn't be there was solved by randomly requiring it to be present or requiring it to be absent.

This needs to be made much, much easier. During the writing stage, just linking to the canonical location of a reference should suffice. Afterwards, a tool should find all the links, see if they exist in the reference database, and if so, add them to the reference section in the appropriate order (RFC number, appearance for non-RFCs, whatever), where each reference is just text without any markup with a link back to the reference database rather than the carefully marked up and therefore incredibly hard to generate reference format for non-RFC references that XML2RFC imposes.


I hope I've been able to make a good case that an HTML format is useful as an authoring format. Now on to HTML as a consumption format.

There are currently text, PDF and different HTML versions of RFCs. These are created by running a tool that performs the necessary conversions. However, making such tools isn't all that easy, so it would be very good to remove the dependency on tools where we can. And CSS gives us exactly that. Again, I'm no HTML/CSS expert, but I do know that you can make CSS do almost anything in a browser. And the really good part is that this can all be done by modifying an external CSS file, without making ANY changes to the actual HTML content.

So it's entirely conceivable that the same HTML file can be displayed like this:


or like this:


Or in may other ways. An HTML RFC CSS file would require quite a bit of plumbing to get some stuff done, but lots of things, such as TOC presence, colors, margins and font type/size should be very easy to change by any user who can edit a text file.


Because all the pertinent information is explicitly marked up, parsing an HTML RFC would both be easy and relatively foolproof. The only thing we need to do is limit the HTML allowed to the subset that we need for RFCs, so tools don't have to implement esoteric browser features, but can simply work on basic HTML. The hard part of a browser's job is displaying the HTML, which our tools wouldn't need to do, they only have to parse it.

We can also format the HTML elements and text such that the text sticks relatively close to the current ASCII format, so that simply removing all the HTML leaves a usable ASCII version.


So if we can have a format that is both good for writing RFCs as well as for reading them on screens and parse them with relatively simple tools, such a format would of course also be the perfect format to archive, and at some later point use as the basis for an update.


The only limitation that we currently have is that printing HTML doesn't always produce the best results. So for printing, it would probably be useful to have one or more alternative PDF versions. More if we don't care about page numbers, so we'd have a version for letter size and one for A4 size, one if we do care about page numbers, which will print on both letter and A4 paper.

Note that having alternative versions for printing is less of an imposition than having alternative versions for different types of displays, because with printing "deep linking" is rarely an issue, while following a link to a section of an RFC in a format that is not really compatible with your screen size can be disruptive and isn't easy to avoid if multiple versions for different screens exist. Printing on the other hand is a deliberate process where extra steps to find the right format are less disruptive.

More information about the rfc-interest mailing list