[rfc-i] Byte Order Marks for UTF-8

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Wed Jul 18 18:04:10 PDT 2012


On 2012/07/19 1:11, Phillip Hallam-Baker wrote:
> As far as XML is concerned:
>
> Entities encoded in UTF-16 must and entities encoded in UTF-8 may
> begin with the Byte Order Mark described by Annex H of [ISO/IEC
> 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH NO-BREAK SPACE
> character, #xFEFF). This is an encoding signature, not part of either
> the markup or the character data of the XML document. XML processors
> must be able to use this character to differentiate between UTF-8 and
> UTF-16 encoded documents.

That's in the current (fifth) edition of the XML spec, and all earlier 
ones up to the second edition (Oct. 2000). But the first edition (Feb. 
1998) doesn't have it, and that's why some old tools don't grok it.

Regards,    Martin.


More information about the rfc-interest mailing list