[rfc-i] Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt

Martin Duerst duerst at it.aoyama.ac.jp
Mon Oct 6 19:49:27 PDT 2008

Hello Paul,

Overall, the RIGHT thing to do! Full support!

Just a few comments:

Somewhere early on, please explicitly say that UTF-8 is
fully and thorougly upwards compatible to US-ASCII. While
a lot of people are very familliar with this, a lot of
others aren't, and so mentioning it explictitly will
help them to worry less.

Introduction: "there are not absolute mappings between" ->
   "there are no absolute mappings between"

Introduction: I think using RFC 2119 language would be preferable.
The IAOC can then just point here. But I think doing what the IAOC
prefers is the best choice.

2.1: I think there are other places where Non-ASCII makes sense.
E.g.: Acknowledgment section, in the actual text for document
titles (or names) quoted there, and so on.
(if you want, you can add my name to the Acks section as an example :-)
(you can then add Xiaodong and Patrick there for volunteering
parts of their name :-).

2.1: "Quotations from non-English languages": English occasionally
warrants non-ASCII, too. I'd write: "Quotations where the original
contains non-ASCII characters" or some such.

2.1: "Protocol examples that show": I'd change this to
"Protocol examples that include". Show is the result of actually
using UTF-8.

2.1: Why is IEA spelled with lower-case, but the other two use
upper case?

   UTF-8 is an encoding of the Unicode Character Set and can be used to
   any of its numeric codepoints, from 0 to 0x10FFFF inclusive.
   UTF-8 is an encoding of the Unicode Character Set and can be used to
   encode any of its numeric codepoints, from U+0000 to U+10FFFF inclusive.
   ^^^^^^                                     ^^^^^^    ^^^^^^^^

   Specifications encoded in UTF-8 should not contain the encodings of
   certain Unicode codepoints.  The codepoint ranges given in this
   section are inclusive:
I read the 'inclusive' as "these are okay". Also, "the encodings of"
looks redundant in this context. What about simply:
   Specifications using UTF-8 must not use the following codepoint
(I changed the should to a must, because your list is extremely

2.2: Add the C1 controls:
   o  The "C1 control characters" in the range U+0080 to U+009F.
      These also lack either visual representations,
      interoperable semantics, or both.

2.2: I would add a general sentence saying that there are many other
kinds of codepoints (e.g. unassigned, control- or formatting-like,
compatibility,...) that should only used with great care if at all.
You have something about compatibility characters, but I think wording
that in a more general fashion is better.

2.3: There is no character "lowercase-a-with-accent". There are many
different accents. What you want here is:
   For example, the character "lowercase-a-with-accute-accent"...

2.3: You should provide a reference for NFC.

2.3, or somewhere else: I'd like to see a strong recommendation
against "smart quotes", or otherwise some discussion of them
(we could propose that "smark quotes" are okay for textual
quotation, but not for protocol examples when they are supposed
to represent "'" or '"'). Smart quotes easily get into IETF
documents, and may be rather difficult to detect automatically.

In Appendix A.4, it says:
   In fact, most such systems have glyphs for rendering unknown
   characters and different glyphs for rendering known characters for
   which the system has no font.
Systems that e.g. use a Last Resort Font (see
http://unicode.org/policies/lastresortfont_eula.html) may essentially
make such a distinction, but many systems just use a single glyph,
so I think the above sentence should be reworded (at least change
'most' to 'some').

Regards,     Martin.

At 06:20 08/10/07, Paul Hoffman wrote:
>>A New Internet-Draft is available from the on-line Internet-Drafts
>>      Title           : Using non-ASCII Characters in RFCs
>>      Author(s)       : T. Bray, P. Hoffman
>>      Filename        : draft-hoffman-utf8-rfcs-03.txt
>>      Pages           : 8
>>      Date            : 2008-10-6
>>This document specifies a change to the IETF process in which
>>   Internet Drafts and RFCs are allowed to contain non-ASCII characters.
>>   The proposed change is to change the encoding of Internet Drafts and
>>   RFCs to UTF-8.
>>A URL for this Internet-Draft is:
>This version has minor changes based on the traffic on this list. There is 
>a change list at the end of the draft.
>In specific, a UTF8ized version of this specific draft can be found at 
><http://www.vpnc.org/temp/draft-hoffman-utf8-rfcs-03.utf8>. I'll do the 
>same for future drafts.
>--Paul Hoffman, Director
>--VPN Consortium
>rfc-interest mailing list
>rfc-interest at rfc-editor.org

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     

More information about the rfc-interest mailing list