[rfc-i] Byte Order Marks for UTF-8
tbray at textuality.com
Wed Jul 18 09:23:16 PDT 2012
That's probably a good recommendation, if we couple it with a mandate to
never generate UTF-16.
On Jul 18, 2012 9:11 AM, "Phillip Hallam-Baker" <hallam at gmail.com> wrote:
> As far as XML is concerned:
> Entities encoded in UTF-16 must and entities encoded in UTF-8 may
> begin with the Byte Order Mark described by Annex H of [ISO/IEC
> 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH NO-BREAK SPACE
> character, #xFEFF). This is an encoding signature, not part of either
> the markup or the character data of the XML document. XML processors
> must be able to use this character to differentiate between UTF-8 and
> UTF-16 encoded documents.
> I am not aware of any software that breaks if the BOM is omitted. I
> have seen a lot of editors and code tools (including the
> xml.resource.org one) that breaks if one is specified. I suggest then
> that the tools accept XML with a BOM but not generate one.
> On Wed, Jul 18, 2012 at 11:20 AM, Paul Hoffman <paul.hoffman at vpnc.org>
> > On Jul 18, 2012, at 8:11 AM, Phillip Hallam-Baker wrote:
> >> I think the tools need to accept a BOM but not require one.
> > We aren't talking about "the tools" (or, we shouldn't be): we are
> talking about the publication format.
> > We should test published formats with and without a BOM and see which
> causes more good, and more bad, results.
> > --Paul Hoffman
> > _______________________________________________
> > rfc-interest mailing list
> > rfc-interest at rfc-editor.org
> > https://www.rfc-editor.org/mailman/listinfo/rfc-interest
> Website: http://hallambaker.com/
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rfc-interest