[rfc-i] Re: Draft: Representation of Unicode and UTF-8 characters

Henning Schulzrinne hgs at cs.columbia.edu
Tue May 18 11:22:21 PDT 2004


> I also share the concern about overengineering this as well as Paul 
> Hoffman's concern about changing mnemonic, but incorrectly-written, 
> names into ones that look really ugly and lose mnenomic value.  For 
> names containing non-ASCII characters, it would seem to me to be 
> appropriate to have, not only "author's choice", but also some 
> convention for representing both "spellings" in the text.
> 

I've added text to the draft to require that all names have an 
ASCII-only spelling:

* Representation of Unicode and UTF-8 characters

IETF documents may need to represent Unicode characters while
obeying the US-ASCII encoding rules. Unicode use cases include
protocol examples and human names in acknowledgments. To
represent Unicode characters in IETF documents, authors should
use the following conventions:

- Unicode strings use the <U+1234 U+2345> notation suggested by the
Unicode specification
(http://www.unicode.org/versions/Unicode4.0.0/Preface.pdf#G1771), for
example "M<U+00BC>nchen" for the Bavarian city Munich.

- UTF-8 (RFC 3629) strings enumerate the bytes as uppercase hexadecimal
digits in angled brackets, e.g., <C2 A9> for the <U+00A9> (copyright)
character.

- The literal < uses the Unicode rendition <U+003C> in those cases where
this can be misinterpreted, i.e., where the open angle bracket is
followed by U+ or a hex digit.

Author and editor names in the RFC headings should not use this
convention, as these names are included in databases, listings and web
pages, where such usage would likely be confusing.  Other names (e.g.,
in the list of contributors or acknowledgments) must provide the
ASCII-only form first.  For example, Wilhelm Conrad Roentgen
(R<U+00F6>ntgen), for the discoverer of X-rays.

Documents may choose different conventions.  Regardless of what
conventions are used, an explanation of the conventions used should be
incorporated into the document.

As this web page is subject to change, documents using these conventions
should either directly incorporate applicable portions of the above
explanation into their documents or incorporate by reference the Unicode
specifications from which these conventions are derived from.


More information about the rfc-interest mailing list