[rfc-i] Unicode or UTF-8

Paul Hoffman paul.hoffman at vpnc.org
Wed Mar 28 12:01:24 PDT 2012


On Mar 28, 2012, at 6:57 PM, Tim Bray wrote:

> I confess that I can never resist a chance at character-encoding
> pedantry.  The BOM is actually not there to identify UTF-8, it’s there
> because the BOM character exists to help sort out byte order in other
> encodings that actually have byte-order issues (UTF-8 doesn’t) and
> since it’s a Unicode character, there’s a UTF-8 encoding for it.  The
> issue of how you identify the encoding of a chunk of bytes,
> particularly in the Web context, is a vexed one, particularly with
> XML, which makes the encoding of a document self-identifying; so
> should you believe what the doc says about itself, or the server’s
> opinion as expressed in the Content-type; but I digress... -T


+1. RFC 3829 says that using the BOM in a UTF-8 file "is useless". Let's not go there.

--Paul Hoffman



More information about the rfc-interest mailing list