[rfc-i] Draft: Representation of Unicode and UTF-8 characters
Henning Schulzrinne
hgs at cs.columbia.edu
Fri May 14 18:06:45 PDT 2004
Here's a draft for the guidelines, as requested by Aaron:
* Representation of Unicode and UTF-8 characters
For names in acknowlegements and protocol examples, it is often
desirable to represent Unicode characters, either abstractly or as the
character would be coded in UTF-8 (RFC 3629). To avoid violating the
US-ASCII-only rule for RFCs, it is suggested to write these characters
using the following textual conventions:
- Unicode strings use the <U+1234,U+1234> notation suggested by the
Unicode specification
(http://www.unicode.org/versions/Unicode4.0.0/Preface.pdf#G1771), for
example "M<U+00BC>nchen" for the Bavarian city Munich.
- UTF-8 strings enumerate the bytes as uppercase hexadecimal digits in
angled brackets, e.g., <C2 A9> for the <U+00A9> (copyright) character.
- The literal < uses the Unicode rendition <U+003C> in those cases where
this can be misinterpreted, i.e., where the open angle bracket is
followed by U+ or a hex digit.
Documents may choose a different convention, but then need to explain
the notation.
More information about the rfc-interest
mailing list