[rfc-i] I-D ACTION:draft-hoffman-utf8-rfcs-02.txt
moore at cs.utk.edu
Sun Sep 28 22:02:28 PDT 2008
Dave CROCKER wrote:
> During one or another of the IDN-or-related bits of working group confusion, I raised a concern that was in the same camp: we have a long-standing lingua
> franca, called net-ascii, and the current effort was, in effect, to re-define it to be utf-8.
> My obvious question was the same as here: Has the world changed so much that we can honestly claim that it is universally -- not just "widely" supported as the least-common denominator in the character-encoding game.
The UTF-8 charset is widely supported. What may not be so widely
supported is UTF-8 with FF characters to mark page breaks, and CR LF
sequences to mark ends of lines.
(Note that arguably, net-ascii is not as well supported as it used to
be. Last I knew, Windows systems could not even print them correctly.)
An interesting variant on UTF-8 might be very minimal HTML using UTF-8
as a document charset. i.e., the absolute minimal HTML header to
declare the document charset as being UTF-8, followed by a body
RFC goes here (with the usual mappings "<" -> < etc.)
I suspect that such documents would be more widely readable, printable,
etc. than UTF-8 with FFs and CRLFs.
Of course then we are close to the HTML slippery slope... once we accept
that the document is intended to be viewed or printed by a web browser,
why not add boldface and italic, links, IMG tags, colors, fonts, tables,
a line needs to be drawn somewhere, but getting agreement on where to
draw the line.
More information about the rfc-interest