[rfc-i] Potential RFC format approach: HTML

Julian Reschke julian.reschke at gmx.de
Sat Mar 24 03:25:20 PDT 2012

On 2012-03-24 11:09, Dave Crocker wrote:
> On 3/24/2012 9:33 AM, Julian Reschke wrote:
>>> - Subset HTML. The subset would be decided upon based on current wide
>>> support, expectation that it will continue to work, repeatability,
>>> stability
>>> of output, etc. However, as long as old-ish browsers can render the
>>> output,
>>> they might not need to get the full experience.
>> +1 in general. The requirement "expectation that it will continue to
>> work" isn't
>> really helping, because in practice, anything that is widely used
>> today isn't
>> going to go away.
> Oh boy.
> In spite of the fact that you are an experienced and prudent guy, let me
> suggest your above assertion is not terribly prudent.
> The strength of the ASCII base has been its minimal processing
> requirements and its excellent, long-term stability.
> HTML isn't even close to trivial, by that metric. Worse, as soon as you

No, it's not trivial at all.

> say "subset" you mandate specialized software.

For production/checking maybe, for consumption, no.

> "HTML" in the open Internet is spectacularly variable. It works because
> HTML engines know to process quite a bit of that variation. The
> remainder doesn't get processed properly.

That is true for invalid content, but to a far lesser degree for, for 
instance, valid HTML 4.

With the HTML 5 spec, parsing of any kind of broken input gets 
well-defined (and is being implemented in browsers right now). (And yes, 
I don't like the way that spec is being developed, but that doesn't 
change the facts about what it does).

> This establishes an extremely unstable processing base, no matter how
> widely usable it is at any given moment, such as "today".

I disagree that processing of the markup itself is non-stable. You seem 
to think about broken content, weird scripts, weird embedding 
techniques, broken CSS. We don't have to use any of that (that's why I 
said "profile").

> In an environment like that, any assurances of future support -- say 30
> or 50 years from now, nevermind 100 -- is problematic.
>> I believe what's as important is to profile the document structure,
>> define
>> default CSS (we want some uniformity, right?), and recommend certain
>> ways to use
> css. Excellent. A parallel item to support and hope is compatible
> decades from now, if the item can be found.

It doesn't need to be "separate". And yes, any HTML better be formatted 
in a way that it doesn't get unreadable if CSS doesn't work. But I don't 
think the conclusion should be not to use it at all.

>>> - Define a very strict structure, using a microformat/semantic style,
>>> that
>>> makes it easy to pull out information semantically with a little bit of
>>> jQuery in post-processing tools. Make a choice about XML-style
>>> well-formedness, which is probably not needed.
>> +-0; it's possible to embed all metadata in the HTML, but that makes
>> us depend
>> on conventions (microformats) or specs that are currently a bit in
>> flux (RDFa vs
> flux. exactly the right word.
> For the current exercise, the requirement to attain long-term viability
> that is on a par with what was achieved with the original ASCII base, is
> simplicity and stability.
> It's not enough to be clever with something that can "be made to work".
> It might or might not work tomorrow. Anyone thinking otherwise needs to
> produce a comparable historical basis for their belief.

That's why I recommended to pick a reliable profile, and not go as far 
as Joe proposed.

On the other hand, asking for data proving that something will be as 
reliable as "text/plain; charset=US-ASCII" 50 years from now essentially 
means killing any potential progress; because that's impossible to prove.

Best regards, Julian

More information about the rfc-interest mailing list