[rfc-i] A proposal: HTML

Julian Reschke julian.reschke at gmx.de
Thu May 24 14:38:42 PDT 2012

On 2012-05-24 22:16, Iljitsch van Beijnum wrote:
> It seems to me that the discussion is veering off in unproductive directions. So I'll explain the HTML format I have in mind and why I think HTML is a good direction, better than any of the alternatives.
> Note that I'm not an HTML expert, there are probably better ways to do many of the things that I'm suggesting here.
> A big problem today is that the draft/RFC format can't easily be generated by widely used tools, and the tool(s) that _can_ generate it are not exactly user friendly. There is no magic bullet that will make that problem disappear completely, but HTML gets us at least a good part of the way there. So let's discuss that part first.
> Writing a draft/RFC has three main parts:
> 1. The front matter, with author names, publication dates, titles, etc
> 2. The main body text and headings
> 3. The references
> At this point, I'm going to refrain from complaining about how XML2RFC handles part 1, but rather assume that there is no reasonable way to make this completely painless, and suggest that although the details can be improved, we should probably stick close to how XML2RFC handles this. That could look as follows:
> <p class="hiddenfrontmatter">
>    <form name="rfcattributes">
>      <input type="hidden" name="toc" value="yes">
>      <input type="hidden" name="ipr" value="trust200902">
>      <input type="hidden" name="docname" value="draft-ietf-behave-ftp64-12">
>      <input type="hidden" name="category" value="std">
>    </form>
>    <form name="author1">
>      <input type="hidden" name="surname" value="Van Beijnum">
>      <input type="hidden" name="fullname="Iljitsch van Beijnum">
>      <input type="hidden" name="shortfullname="I. van Beijnum">
>      <input type="hidden" name="altfullname="Ильи́ч van Beijnum">
>      <input type="hidden" name="organization" vaule="Institute IMDEA Networks">
> </form>
> etc

Hiding things makes it more likely to break them. There are better ways 
to do this, such as RDFa, microformats or microdata. Unfortunately these 
are multiple, competing ways, so there will be disagreement about which 
to pick. This is why I personally would prefer to stick with what we 
have (RFC2629) and to carefully extend it.

> Part 2, the main text. The problem with XML2RFC is that it heavily relies on nested<t>  elements. This is unlike HTML or the way in which word processors work, where the heading levels are not automatically deduced from the explicit nesting, but rather the heading level is explicit. So converting back and forth between HTML and word processor formats is much more convenient and less likely to lead to loss of information. Also, the HTML way of doing this is much easier on humans, because you don't have to hunt through the entire document for missing</t>  elements (yes, I've missed submission deadlines for this reason).
> An example:
> <h2>Stateless EPRT translation</h2>
>                          <p>
> If the address specified in the EPRT command is the client's IPv6 address, then the FTP ALG reformats the EPRT command into a PORT command with the IPv4 address that maps to the client's IPv6 address. The port number must be preserved for compatibility with stateless translators.
>                          </p>

What you consider a feature others (IMHO correctly) see as a defect in 
HTML 4, and HTML 5 happens to have ... <section> 

> (Note that the section numbering can be made automatic using CSS but it may be useful to make this explicit/hardcoded during the RFC publication process.)

Delegating to CSS makes it impossible to copy paste from the HTML into 
plain text (think email), so I think that should be avoided.

> Then there is part 3. This is unbelievably bad in XML2RFC. In order to refer to an RFC, which is by far the easiest type of reference, I need to have incomprehensible incantations at the top and the bottom of my document:
> <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
>    <!ENTITY rfc959 PUBLIC '' "http://xml.resource.org/public/rfc/bibxml/reference.RFC.0959.xml">
> ]>

No, you don't need that at all. You can simply copy & paste the contents 
of <http://xml.resource.org/public/rfc/bibxml/reference.RFC.0959.xml> 
into your document.

> <references title="Normative References">
>        &rfc959;
> </references>
> so I get to use these in my text:<xref target="RFC0959" />. I guess the decision on whether the leading zero should or shouldn't be there was solved by randomly requiring it to be present or requiring it to be absent.

That's a tool problem. The fact that nobody has written that tool could 
mean that the problem isn't a big as you think. I personally have no 
problem maintaining my references by hand.

> This needs to be made much, much easier. During the writing stage, just linking to the canonical location of a reference should suffice. Afterwards, a tool should find all the links, see if they exist in the reference database, and if so, add them to the reference section in the appropriate order (RFC number, appearance for non-RFCs, whatever), where each reference is just text without any markup with a link back to the reference database rather than the carefully marked up and therefore incredibly hard to generate reference format for non-RFC references that XML2RFC imposes.
> I hope I've been able to make a good case that an HTML format is useful as an authoring format. Now on to HTML as a consumption format.
> There are currently text, PDF and different HTML versions of RFCs. These are created by running a tool that performs the necessary conversions. However, making such tools isn't all that easy, so it would be very good to remove the dependency on tools where we can. And CSS gives us exactly that. Again, I'm no HTML/CSS expert, but I do know that you can make CSS do almost anything in a browser. And the really good part is that this can all be done by modifying an external CSS file, without making ANY changes to the actual HTML content.
> So it's entirely conceivable that the same HTML file can be displayed like this:
> http://tools.ietf.org/html/rfc2460
> or like this:
> http://pretty-rfc.herokuapp.com/RFC2460

Almost. These two use entirely different HTML formats.

> Or in may other ways. An HTML RFC CSS file would require quite a bit of plumbing to get some stuff done, but lots of things, such as TOC presence, colors, margins and font type/size should be very easy to change by any user who can edit a text file.


But then, rfc2629.xslt has allowed overriding the builtin CSS stylesheet 
for many many years, but I haven't seen anybody using them. Maybe that 
just means that a good-enough stylesheet is sufficient.

> Because all the pertinent information is explicitly marked up, parsing an HTML RFC would both be easy and relatively foolproof. The only thing we need to do is limit the HTML allowed to the subset that we need for RFCs, so tools don't have to implement esoteric browser features, but can simply work on basic HTML. The hard part of a browser's job is displaying the HTML, which our tools wouldn't need to do, they only have to parse it.

I'll remain skeptical about that until I see a concrete example plus 
code that recovers all the information we have in RFC2629 right now.

> We can also format the HTML elements and text such that the text sticks relatively close to the current ASCII format, so that simply removing all the HTML leaves a usable ASCII version.
> So if we can have a format that is both good for writing RFCs as well as for reading them on screens and parse them with relatively simple tools, such a format would of course also be the perfect format to archive, and at some later point use as the basis for an update.
> The only limitation that we currently have is that printing HTML doesn't always produce the best results. So for printing, it would probably be useful to have one or more alternative PDF versions. More if we don't care about page numbers, so we'd have a version for letter size and one for A4 size, one if we do care about page numbers, which will print on both letter and A4 paper.
> Note that having alternative versions for printing is less of an imposition than having alternative versions for different types of displays, because with printing "deep linking" is rarely an issue, while following a link to a section of an RFC in a format that is not really compatible with your screen size can be disruptive and isn't easy to avoid if multiple versions for different screens exist. Printing on the other hand is a deliberate process where extra steps to find the right format are less disruptive.


And again, this problem has been solved years ago. Just feed rich HTML 
into PrinceXML and get PDF from it. It's a commercial tool, but I'm sure 
the IETF could afford it.

Best regards, Julian

More information about the rfc-interest mailing list