[rfc-i] New version: draft-hoffman-utf8-rfcs-04.txt

Martin Duerst duerst at it.aoyama.ac.jp
Sun Nov 9 18:05:57 PST 2008


At 00:13 08/11/07, Keith Moore wrote:
>Tim Bray wrote:

>> 1.3 Ability to include accurate, readable examples of the use of
>> non-ASCII characters in IETF protocols.  Benefit: Major.  In practice,
>> internationalization is observed to be a frequent source of
>> interoperability difficulties on the Internet.  Whereas the users of
>> IETF specifications, in theory, would be happy to work from the
>> normative prose and formal specifications, in practice the usability
>> of specifications is found to be increased by the inclusion of
>> high-quality examples.  In theory, such examples need not actually
>> contain non-ASCII characters; the familiar U+XXXX notation can be used
>> to stand in for them.  In practice, readability and terseness are
>> observed to improve the usability of specifications.
>>   
>I have serious doubts about this benefit, for two reasons.  The first is
>that in protocol implementation the appearance of the characters is
>often less important than the octets that get sent on the wire, and the
>U+XXXX notation is far superior for this purpose.  The second is that
>there is so much variation in how UTF-8 is displayed in practice, that
>using UTF-8 to illustrate how something should look like when rendered
>is unlikely to work well.   Images would be much better.

In the cases I know best, in particular IRIs and IDNs, the point is
exactly NOT to show what happens when UTF-8 goes over the wire.
IDNs use punycode over the wire. IRIs use characters on the side
of a bus (or wherever). In particular in these cases, it's very
helpful to have the real characters in contrast with some
ACE notation.


>Basically I find the benefit for "plain UTF-8 text" to not be
>sufficiently compelling to justify the pain of changing and the
>disruption that this would cause given the state of existing tools.  I
>can make a much stronger case for use of HTML in RFCs, with UTF-8 as the
>character set.  First, it appears that web browsers are much more likely
>to provide good support for UTF-8 (display as well as printing) than
>tools that handle "plain text".   Second,  unlike plain text, HTML has
>an integral and well-established mechanism for specifying its character
>set.  Third, HTML RFCs would presumably permit use of images to display
>examples - including but not limited to examples of how non-ASCII
>characters should be rendered.  This is much more reliable than
>expecting the reader's tools (even his web browser) to correctly render
>UTF-8.

The last point may be helpful in some situations, but it would be
overkill to require .gif (or whatever) equivalents for every non-ASCII
character.

Overall, it's a question of many small steps or one big step.
I'm fine with either, as long as we get this thing off the ground
(or, given that ASCII-only plain text is from computer stone age,
I should probably say "off the basement" or some such).

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the rfc-interest mailing list