[rfc-i] I-D ACTION:draft-hoffman-utf8-rfcs-02.txt

Keith Moore moore at cs.utk.edu
Sun Sep 28 22:02:28 PDT 2008


Dave CROCKER wrote:
> During one or another of the IDN-or-related bits of working group confusion, I raised a concern that was in the same camp:  we have a long-standing lingua 
> franca, called net-ascii, and the current effort was, in effect, to re-define it to be utf-8.
>
> My obvious question was the same as here:  Has the world changed so much that we can honestly claim that it is universally -- not just "widely" supported as the least-common denominator in the character-encoding game.
>   
The UTF-8 charset is widely supported.  What may not be so widely
supported is UTF-8 with FF characters to mark page breaks, and CR LF
sequences to mark ends of lines.

(Note that arguably, net-ascii is not as well supported as it used to
be.  Last I knew, Windows systems could not even print them correctly.)

An interesting variant on UTF-8 might be very minimal HTML using UTF-8
as a document charset.  i.e., the absolute minimal HTML header to
declare the document charset as being UTF-8, followed by a body
consisting of

<BODY>
<PRE>
RFC goes here (with the usual mappings "<" -> &lt; etc.)
</PRE>
</BODY>
</HTML>

I suspect that such documents would be more widely readable, printable,
etc. than UTF-8 with FFs and CRLFs.

Of course then we are close to the HTML slippery slope... once we accept
that the document is intended to be viewed or printed by a web browser,
why not add boldface and italic, links, IMG tags, colors, fonts, tables,
css, frames, flash, javascript, etc...  The problem isn't realizing that
a line needs to be drawn somewhere, but getting agreement on where to
draw the line.

Keith



More information about the rfc-interest mailing list