[rfc-i] Unicode or UTF-8
tbray at textuality.com
Tue Mar 27 20:52:47 PDT 2012
The good news is that UTF-8 seems to be advancing inexorably on its
own steam because it, you know, just works
On Tue, Mar 27, 2012 at 8:08 PM, "Martin J. Dürst"
<duerst at it.aoyama.ac.jp> wrote:
> This is a followup to
> http://www.ietf.org/proceedings/83/minutes/minutes-83-rfcform.txt and
> http://www.ietf.org/jabber/logs/rfcform (which currently doesn't exist; hope
> this gets fixed).
> The minutes note:
> 8:00 - 18:10 = Questions/Comments
> 13) ? from Jabber: let's not talk about encodings. Unicode not UTF-8.
> Having a fixed encoding is *extremely* helpful (if you don't believe that,
> just think about how easy US-ASCII is in comparison with the mess of
> encodings we have for non-ASCII text).
> For the IETF, that would be UTF-8, because of RFC 2130
> (http://tools.ietf.org/html/rfc2277) and everything after. That should apply
> to xml2rfc source, the equivalent of the current .txt, and HTML at least, as
> far as these are going to be used.
> This may not apply to all formats, though. I'm not sure what .docx uses
> internally. I'm not sure what PDF uses internally if you tell it to keep all
> the text in Unicode. [Leonard?] For these cases, there's of course
> absolutely no point to force them to use UTF-8 if they use another encoding
> of Unicode.
> So I'd think the conclusion should be:
> UTF-8 where there's an (unnecessary!) choice, whatever Unicode encoding was
> chosen for formats where a single choice already has been made.
> Regards, Martin.
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
More information about the rfc-interest