[rfc-i] Unicode or UTF-8

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Tue Mar 27 20:08:54 PDT 2012


This is a followup to 
http://www.ietf.org/proceedings/83/minutes/minutes-83-rfcform.txt and 
http://www.ietf.org/jabber/logs/rfcform (which currently doesn't exist; 
hope this gets fixed).

The minutes note:

8:00 - 18:10 = Questions/Comments
-------------

13) ? from Jabber: let's not talk about encodings. Unicode not UTF-8.


Having a fixed encoding is *extremely* helpful (if you don't believe 
that, just think about how easy US-ASCII is in comparison with the mess 
of encodings we have for non-ASCII text).

For the IETF, that would be UTF-8, because of RFC 2130 
(http://tools.ietf.org/html/rfc2277) and everything after. That should 
apply to xml2rfc source, the equivalent of the current .txt, and HTML at 
least, as far as these are going to be used.

This may not apply to all formats, though. I'm not sure what .docx uses 
internally. I'm not sure what PDF uses internally if you tell it to keep 
all the text in Unicode. [Leonard?] For these cases, there's of course 
absolutely no point to force them to use UTF-8 if they use another 
encoding of Unicode.

So I'd think the conclusion should be:

UTF-8 where there's an (unnecessary!) choice, whatever Unicode encoding 
was chosen for formats where a single choice already has been made.

Regards,    Martin.


More information about the rfc-interest mailing list