[rfc-i] Byte order marks

Joe Touch touch at ISI.EDU
Tue Nov 4 08:49:21 PST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Simon Josefsson wrote:
...
> 4) It is still not clear that having a UTF-8 BOM does anything useful.
>    Which implementations are affected?  I believe modern versions of
>    Microsoft tools, which used to be the problematic tools, have been
>    fixed.

Although I agree with your other concerns, Microsoft tools still have
problems.

Under Vista with all recent updates, Wordpad (which does understand
formfeeds) detects UTF-8 and display the characters properly, but also
change fonts the instant they hit UTF-8 and don't change back (from the
default of Courier New to something called SimSun).

Under Vista, Xemacs 21.4.21 (the most recent), even the BOM is
insufficient to indicate that the file should be interpreted as UTF-8.

For Windows users editing fixed-width text files, these are the most
prevalent. (notepad also works for other text, but doesn't interpret
either CR/LF or FF).

A number of these issues are described for Linux at:
http://www.maruko.ca/i18n/

Note the prevalence of explicit commands to get software to understand
UTF-8. So "nearly universally available" means "with some explicit work".

Joe

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkQfREACgkQE5f5cImnZrulrgCcD02okHF4T1fxzWOeHXHgxFwz
ih4AoPABzV4QdRgup8mUe0VUTYTQIMGX
=z6OT
-----END PGP SIGNATURE-----


More information about the rfc-interest mailing list