[rfc-i] New version: draft-hoffman-utf8-rfcs-04.txt

John C Klensin john+rfc at jck.com
Tue Nov 4 07:36:49 PST 2008

--On Monday, 03 November, 2008 18:11:53 -0800 Paul Hoffman
<paul.hoffman at vpnc.org> wrote:

> We have incorporated many changes from the last round of
> discussion.
> Earlier, people asked to see how this draft would look if it
> actually had UTF-8 in it. I put it on a web site, but people
> argued about how the content-type and character set affected
> how they saw it. This time, I'm attaching the UTF-8ized
> version to this message in a way that I'm 99% sure will be have
> Content-Type: application/octet-stream. That should be closer
> to how the file would be handled in the real world of FTP
> servers and manual copying and so on. FWIW, the attached file
> starts with a UTF-8 byte order mark.


While I'm still unconvinced that this is a good idea at this
time, the proposal is getting, IMO, much better and much more
realistic.  Let me make a few suggestions that might make it
more palatable to those of us who tend to be very conservative
in these matters.

(1) While either an obviously bogus character or a indicator for
a character that cannot be displayed is clearly superior to just
discarding the problematic code point, we will probably need to
agree to disagree on whether those approaches are adequate given
the information loss they imply.  That disagreement may be
equivalent to a disagreement about how realistic it is to say
"well, maybe that will motivate you to upgrade your software" or
"maybe that will motivate you to install more fonts, a different
printer, or whatever".

(2) Given that, this would feel more realistic to me if you
required that all essential normative material and contact
information included, or was associated with, an ASCII
representation (either as an ASCII transliteration or
transcoding or by using RFC5137 escapes.   The examples at the
end of Section 2.4 are quite reasonable in this regard. I'm
suggesting that the "can have alternate spellings" provision be
changed to a requirement and that the RFC Editor be enabled and
advised to specify formatting rules that make it easier for
people (and machines) to find the processes they need.

A different way to put this is that I'm suggesting a middle
ground, one in which UTF-8 can be used in places where it would
improve the accuracy or comprehensibility of an RFC, more or
less with the criteria you suggest, but that sufficient
supplemental information must be present so that there is no
loss of critical information if the UTF-8 cannot be rendered
accurately and read by the reader.

(3) Similarly, if a document contains non-ASCII characters in
postal addresses, there should be a rule that those addresses at
least conform to International Postal Union rules for
international mail addressing.   A side-effect of such a
requirement would be an explicit country indication, i.e.,
getting rid of the long-term RFC assumption that an address
without a country specification is in the USA, but that might
not be a bad thing.

Finally, an editorial/procedural suggestion and a nit:

(4) You may have explicitly decided to not do it for some reason
(which would be fine, although a note in draft would help
prevent this suggestion from recurring), but you could get
around the problem of finding a good reference for NFC and some
of the character/code-point exclusion issues with a reference to
RFC 5198.   If the Unicode-representation rules that are needed
for RFCs and those that are needed for the 5198 interchange
format are significantly different, then 5198 may need to be

(5) You should at least be aware that the attachment you sent
out is being stripped by the digesting mechanism for those who
receive this list as a digest.  Might I suggest that you take
advantage of the ability to post an I-D in both text and PDF
form (getting an AD to request that the Secretariat waive the
rules and manually post the latter if necessitated by the
various checkers rejecting embedded non-ASCII characters) to
illustrate what the UTF-8 format document would look like?


More information about the rfc-interest mailing list