[rfc-i] New version: draft-hoffman-utf8-rfcs-04.txt

Keith Moore moore at cs.utk.edu
Thu Nov 6 07:13:28 PST 2008


Tim Bray wrote:
> 1.2 Ability to spell the names of contributors to IETF specifications
> correctly.  Benefit: Dependent on one's world-view. For example, I
> find it unacceptable, verging on bigotry, that in RFC5023, the name of
> one of its editors is spelled incorrectly 
I think this is a significant benefit, though not sufficiently
compelling by itself to justify switching from ASCII to UTF-8 -
especially when support for "plain UTF-8 text" seems so sparse and buggy.
> 1.3 Ability to include accurate, readable examples of the use of
> non-ASCII characters in IETF protocols.  Benefit: Major.  In practice,
> internationalization is observed to be a frequent source of
> interoperability difficulties on the Internet.  Whereas the users of
> IETF specifications, in theory, would be happy to work from the
> normative prose and formal specifications, in practice the usability
> of specifications is found to be increased by the inclusion of
> high-quality examples.  In theory, such examples need not actually
> contain non-ASCII characters; the familiar U+XXXX notation can be used
> to stand in for them.  In practice, readability and terseness are
> observed to improve the usability of specifications.
>   
I have serious doubts about this benefit, for two reasons.  The first is
that in protocol implementation the appearance of the characters is
often less important than the octets that get sent on the wire, and the
U+XXXX notation is far superior for this purpose.  The second is that
there is so much variation in how UTF-8 is displayed in practice, that
using UTF-8 to illustrate how something should look like when rendered
is unlikely to work well.   Images would be much better.

Basically I find the benefit for "plain UTF-8 text" to not be
sufficiently compelling to justify the pain of changing and the
disruption that this would cause given the state of existing tools.  I
can make a much stronger case for use of HTML in RFCs, with UTF-8 as the
character set.  First, it appears that web browsers are much more likely
to provide good support for UTF-8 (display as well as printing) than
tools that handle "plain text".   Second,  unlike plain text, HTML has
an integral and well-established mechanism for specifying its character
set.  Third, HTML RFCs would presumably permit use of images to display
examples - including but not limited to examples of how non-ASCII
characters should be rendered.  This is much more reliable than
expecting the reader's tools (even his web browser) to correctly render
UTF-8.

Keith



More information about the rfc-interest mailing list