[rfc-i] Byte order marks

Joe Touch touch at ISI.EDU
Wed Nov 5 07:24:12 PST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, Simon,

Simon Josefsson wrote:
> Joe Touch <touch at ISI.EDU> writes:
> 
>> Let me clarify:
>>
>> Documents that describe the selection of the BOM indicate that the
>> specific values are defined to be a zero-width nonbreaking space, and
>> indicate that when the value is in the middle of a document they must be
>> treated as such:
>>
>> http://unicode.org/faq/utf_bom.html#BOM
>>
>> See also RFC3629
>>
>> If you disagree with the reporting in either of these documents, please
>> indicate so. Yes, I inferred that defining this value as invisible was a
>> primary intent.
>>
>> Finally, if you would like to explain how anything that is a zero-width
>> nonbreaking space should *ever* be visible, please let us know.
> 
> I don't want my editor go information from me.  If a particular code
> point does not refer to a particular glyph, I want some visual cue to
> indicate that it is present.

That is an option in some editors, e.g., to show hidden characters, but
it is also more typical to not show it unless exposed.

E.g., the character is being shown in GNU Emacs, yet spaces, carriage
returns, and form feeds are not.

> Looking more carefully at the Emacs 23 implementation, it has several
> coding systems.  The utf-8 coding system is the default, and it does not
> parse or generate signature BOM.  There is a utf-8-auto coding system
> that will ignore leading BOM.  There is auto-detect features that
> automatically changes the coding system when the input is not detected
> as UTF-8 (e.g., Latin-1).

Understood. These are all, IMO, good reasons to never include the BOM.

Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkRsJUACgkQE5f5cImnZrv7TgCePzQmimmK1XLMJhAYT7D3REFh
E/0An3JqNz42NS+RrhSVV+OKm7qZNZ4+
=3G4D
-----END PGP SIGNATURE-----


More information about the rfc-interest mailing list