[rfc-i] Draft: Representation of Unicode and UTF-8 characters

Henning Schulzrinne hgs at cs.columbia.edu
Fri May 14 18:06:45 PDT 2004


Here's a draft for the guidelines, as requested by Aaron:


* Representation of Unicode and UTF-8 characters

For names in acknowlegements and protocol examples, it is often 
desirable to represent Unicode characters, either abstractly or as the 
character would be coded in UTF-8 (RFC 3629). To avoid violating the 
US-ASCII-only rule for RFCs, it is suggested to write these characters 
using the following textual conventions:

- Unicode strings use the <U+1234,U+1234> notation suggested by the 
Unicode specification 
(http://www.unicode.org/versions/Unicode4.0.0/Preface.pdf#G1771), for 
example "M<U+00BC>nchen" for the Bavarian city Munich.

- UTF-8 strings enumerate the bytes as uppercase hexadecimal digits in 
angled brackets, e.g., <C2 A9> for the <U+00A9> (copyright) character.

- The literal < uses the Unicode rendition <U+003C> in those cases where 
this can be misinterpreted, i.e., where the open angle bracket is 
followed by U+ or a hex digit.

Documents may choose a different convention, but then need to explain 
the notation.



More information about the rfc-interest mailing list