[rfc-i] Potential RFC format approach: HTML
julian.reschke at gmx.de
Sat Mar 24 03:25:20 PDT 2012
On 2012-03-24 11:09, Dave Crocker wrote:
> On 3/24/2012 9:33 AM, Julian Reschke wrote:
>>> - Subset HTML. The subset would be decided upon based on current wide
>>> support, expectation that it will continue to work, repeatability,
>>> of output, etc. However, as long as old-ish browsers can render the
>>> they might not need to get the full experience.
>> +1 in general. The requirement "expectation that it will continue to
>> work" isn't
>> really helping, because in practice, anything that is widely used
>> today isn't
>> going to go away.
> Oh boy.
> In spite of the fact that you are an experienced and prudent guy, let me
> suggest your above assertion is not terribly prudent.
> The strength of the ASCII base has been its minimal processing
> requirements and its excellent, long-term stability.
> HTML isn't even close to trivial, by that metric. Worse, as soon as you
No, it's not trivial at all.
> say "subset" you mandate specialized software.
For production/checking maybe, for consumption, no.
> "HTML" in the open Internet is spectacularly variable. It works because
> HTML engines know to process quite a bit of that variation. The
> remainder doesn't get processed properly.
That is true for invalid content, but to a far lesser degree for, for
instance, valid HTML 4.
With the HTML 5 spec, parsing of any kind of broken input gets
well-defined (and is being implemented in browsers right now). (And yes,
I don't like the way that spec is being developed, but that doesn't
change the facts about what it does).
> This establishes an extremely unstable processing base, no matter how
> widely usable it is at any given moment, such as "today".
I disagree that processing of the markup itself is non-stable. You seem
to think about broken content, weird scripts, weird embedding
techniques, broken CSS. We don't have to use any of that (that's why I
> In an environment like that, any assurances of future support -- say 30
> or 50 years from now, nevermind 100 -- is problematic.
>> I believe what's as important is to profile the document structure,
>> default CSS (we want some uniformity, right?), and recommend certain
>> ways to use
> css. Excellent. A parallel item to support and hope is compatible
> decades from now, if the item can be found.
It doesn't need to be "separate". And yes, any HTML better be formatted
in a way that it doesn't get unreadable if CSS doesn't work. But I don't
think the conclusion should be not to use it at all.
>>> - Define a very strict structure, using a microformat/semantic style,
>>> makes it easy to pull out information semantically with a little bit of
>>> jQuery in post-processing tools. Make a choice about XML-style
>>> well-formedness, which is probably not needed.
>> +-0; it's possible to embed all metadata in the HTML, but that makes
>> us depend
>> on conventions (microformats) or specs that are currently a bit in
>> flux (RDFa vs
> flux. exactly the right word.
> For the current exercise, the requirement to attain long-term viability
> that is on a par with what was achieved with the original ASCII base, is
> simplicity and stability.
> It's not enough to be clever with something that can "be made to work".
> It might or might not work tomorrow. Anyone thinking otherwise needs to
> produce a comparable historical basis for their belief.
That's why I recommended to pick a reliable profile, and not go as far
as Joe proposed.
On the other hand, asking for data proving that something will be as
reliable as "text/plain; charset=US-ASCII" 50 years from now essentially
means killing any potential progress; because that's impossible to prove.
Best regards, Julian
More information about the rfc-interest