[rfc-i] A proposal: HTML

Iljitsch van Beijnum iljitsch at muada.com
Tue May 29 02:44:40 PDT 2012


On 24 May 2012, at 23:38 , Julian Reschke wrote:

>> <p class="hiddenfrontmatter">
>>   <form name="rfcattributes">
>>     <input type="hidden" name="toc" value="yes">
>>     <input type="hidden" name="ipr" value="trust200902">
>>     <input type="hidden" name="docname" value="draft-ietf-behave-ftp64-12">
>>     <input type="hidden" name="category" value="std">
>>   </form>

>>   <form name="author1">
>>     <input type="hidden" name="surname" value="Van Beijnum">
>>     <input type="hidden" name="fullname="Iljitsch van Beijnum">
>>     <input type="hidden" name="shortfullname="I. van Beijnum">
>>     <input type="hidden" name="altfullname="Ильи́ч van Beijnum">
>>     <input type="hidden" name="organization" vaule="Institute IMDEA Networks">
>> </form>

> Hiding things makes it more likely to break them.

The reason I made this hidden was because I can't think of a way to make this stuff human readable in a reasonable way as well as machine readable in a reasonable way. So we should probably have the machine readable stuff hidden from humans, and then a nicely laid out but not marked up human readable version of the same information visible.

> There are better ways to do this, such as RDFa, microformats or microdata. Unfortunately these are multiple, competing ways, so there will be disagreement about which to pick. This is why I personally would prefer to stick with what we have (RFC2629) and to carefully extend it.

If we make a new format that gives us the opportunity to make a clean break. So we should do that where it makes sense. And to me, it certainly makes sense for this kind of stuff, because it is just so painful to author it in XML2RFC format. Because this stuff is specific to RFCs, there are also no opportunities to convert from other formats so it really has to be easy to do this by hand. I'm now actually starting to think the above is still to difficult, so make it more like this:

rfc-toc: yes
rfc-ipr: trust200902
draft-name: draft-ietf-behave-ftp64-12
rfc-intended-status: standard

rfc-author1-surname: Van Beijnum
rfc-author1-fullname: Iljitsch van Beijnum
rfc-author1-nonlatin-name: Ильи́ч van Beijnum

etc

The advantage of this is that you can write it using a word processor where the styles are converted to CSS classes without the need to create any HTML by hand. So this would be a good submission format, although of course not a good archival/presentation format.

> What you consider a feature others (IMHO correctly) see as a defect in HTML 4, and HTML 5 happens to have ... <section> (<http://dev.w3.org/html5/spec/single-page.html#the-section-element>).

Interestingly, both in older HTML and in word processors, there is no way to differentiate between:

<section l1>
  text
    <section l2>
      text
      text
    </section>
</section>

and:

<section l1>
  text
    <section l2>
      text
    </section>
  text
</section>

Now if you find it important to do make such distinctions in your own work, that's fine, but these will be lost on anyone who doesn't program XML parsers.

The trouble with requiring such as the model for an authoring format for which there are insufficient authoring tools is that now people who at best don't care and probably don't even understand such things now have to write such code by hand. In particular, because of the arbitrary nesting that needs to be closed explicitly, it is almost impossible for parsers to tell users where they forgot to close an element, making debugging a significant part of document creation.

The enormous waste of time that this creates (not to mention the frustration it generates) is absolutely not worth whatever gains (which you haven't really explained so far) may be had from a model with containment.

>> (Note that the section numbering can be made automatic using CSS but it may be useful to make this explicit/hardcoded during the RFC publication process.)

> Delegating to CSS makes it impossible to copy paste from the HTML into plain text (think email), so I think that should be avoided.

Then again, manually taking care of this is extra error-prone work, which isn't good, either.

But this can be solved by allowing people to have automatic numbering in the draft format if they want, and then putting in "hard" section numbers during the publication process.

>> In order to refer to an RFC, which is by far the easiest type of reference, I need to have incomprehensible incantations at the top and the bottom of my document:

>> <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
>>   <!ENTITY rfc959 PUBLIC '' "http://xml.resource.org/public/rfc/bibxml/reference.RFC.0959.xml">
>> ]>

> No, you don't need that at all. You can simply copy & paste the contents of <http://xml.resource.org/public/rfc/bibxml/reference.RFC.0959.xml> into your document.

It doesn't make sense to include extensive metadata for each reference in the source of a draft / RFC document. It just clutters up the document.

Remember that references are basically human-parsable hyperlinks. If we then add a machine-parsable actual hyperlink to either the document itself or its canonical location, there is no need to include any additional data except which is presented on screen / paper for human consumption.

>> so I get to use these in my text:<xref target="RFC0959" />. I guess the decision on whether the leading zero should or shouldn't be there was solved by randomly requiring it to be present or requiring it to be absent.

> That's a tool problem. The fact that nobody has written that tool could mean that the problem isn't a big as you think. I personally have no problem maintaining my references by hand.

In the grand scheme of things this isn't the worst problem ever, but just the fact that the format requires me to write the RFC number with and without the leading zero in different places shows that not much care went into all of this, and if we are going to come up with a new format we need to do better.

>> So it's entirely conceivable that the same HTML file can be displayed like this:

>> http://tools.ietf.org/html/rfc2460

>> or like this:

>> http://pretty-rfc.herokuapp.com/RFC2460

> Almost. These two use entirely different HTML formats.

Yes. And my point is that you can make the eventual output look like many different things by just changing a CSS file rather than changing the HTML itself. I acknowledge the fact that that isn't the case here.


More information about the rfc-interest mailing list