[rfc-i] New version: draft-hoffman-utf8-rfcs-04.txt

Martin Duerst duerst at it.aoyama.ac.jp
Sun Nov 9 18:05:57 PST 2008

At 00:13 08/11/07, Keith Moore wrote:
>Tim Bray wrote:

>> 1.3 Ability to include accurate, readable examples of the use of
>> non-ASCII characters in IETF protocols.  Benefit: Major.  In practice,
>> internationalization is observed to be a frequent source of
>> interoperability difficulties on the Internet.  Whereas the users of
>> IETF specifications, in theory, would be happy to work from the
>> normative prose and formal specifications, in practice the usability
>> of specifications is found to be increased by the inclusion of
>> high-quality examples.  In theory, such examples need not actually
>> contain non-ASCII characters; the familiar U+XXXX notation can be used
>> to stand in for them.  In practice, readability and terseness are
>> observed to improve the usability of specifications.
>I have serious doubts about this benefit, for two reasons.  The first is
>that in protocol implementation the appearance of the characters is
>often less important than the octets that get sent on the wire, and the
>U+XXXX notation is far superior for this purpose.  The second is that
>there is so much variation in how UTF-8 is displayed in practice, that
>using UTF-8 to illustrate how something should look like when rendered
>is unlikely to work well.   Images would be much better.

In the cases I know best, in particular IRIs and IDNs, the point is
exactly NOT to show what happens when UTF-8 goes over the wire.
IDNs use punycode over the wire. IRIs use characters on the side
of a bus (or wherever). In particular in these cases, it's very
helpful to have the real characters in contrast with some
ACE notation.

>Basically I find the benefit for "plain UTF-8 text" to not be
>sufficiently compelling to justify the pain of changing and the
>disruption that this would cause given the state of existing tools.  I
>can make a much stronger case for use of HTML in RFCs, with UTF-8 as the
>character set.  First, it appears that web browsers are much more likely
>to provide good support for UTF-8 (display as well as printing) than
>tools that handle "plain text".   Second,  unlike plain text, HTML has
>an integral and well-established mechanism for specifying its character
>set.  Third, HTML RFCs would presumably permit use of images to display
>examples - including but not limited to examples of how non-ASCII
>characters should be rendered.  This is much more reliable than
>expecting the reader's tools (even his web browser) to correctly render

The last point may be helpful in some situations, but it would be
overkill to require .gif (or whatever) equivalents for every non-ASCII

Overall, it's a question of many small steps or one big step.
I'm fine with either, as long as we get this thing off the ground
(or, given that ASCII-only plain text is from computer stone age,
I should probably say "off the basement" or some such).

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     

More information about the rfc-interest mailing list