[rfc-i] Byte Order Marks for UTF-8

Phillip Hallam-Baker hallam at gmail.com
Wed Jul 18 09:11:06 PDT 2012

As far as XML is concerned:

Entities encoded in UTF-16 must and entities encoded in UTF-8 may
begin with the Byte Order Mark described by Annex H of [ISO/IEC
10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH NO-BREAK SPACE
character, #xFEFF). This is an encoding signature, not part of either
the markup or the character data of the XML document. XML processors
must be able to use this character to differentiate between UTF-8 and
UTF-16 encoded documents.

I am not aware of any software that breaks if the BOM is omitted. I
have seen a lot of editors and code tools (including the
xml.resource.org one) that breaks if one is specified. I suggest then
that the tools accept XML with a BOM but not generate one.

On Wed, Jul 18, 2012 at 11:20 AM, Paul Hoffman <paul.hoffman at vpnc.org> wrote:
> On Jul 18, 2012, at 8:11 AM, Phillip Hallam-Baker wrote:
>> I think the tools need to accept a BOM but not require one.
> We aren't talking about "the tools" (or, we shouldn't be): we are talking about the publication format.
> We should test published formats with and without a BOM and see which causes more good, and more bad, results.
> --Paul Hoffman
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest

Website: http://hallambaker.com/

More information about the rfc-interest mailing list