[rfc-i] UTF-8 and Unicode examples

Alex Rousskov rousskov at measurement-factory.com
Tue May 4 10:51:52 PDT 2004


On Tue, 4 May 2004, Henning Schulzrinne wrote:

> Thanks for the pointer. I would suggest that the following convention be
> adopted:
>
> - Unicode strings use the <U+1234,U+1234> notation, as in "M<U+00BC>nchen"
>
> - UTF-8 strings use the <xx xx> notation, where xx are hexadecimal digits.
>
> - The literal < just uses the Unicode rendition in those cases where
> this can be misinterpreted, i.e., where it is followed by U+ or a hex digit.
>
> Does this work?

Using [] instead of <> might be a good idea to reduce the number
of confused applications that would try to XML-ify the escape
sequence.

Should tools like xml2rfc accept/interpret raw UTF-8, the escape
sequence above, or both? This matters because these tools produce both
ASCII text and HTML versions of specs.

Alex.



More information about the rfc-interest mailing list