[rfc-i] draft-flanagan-rfc-framework-00 and byte order mark (BOM)

Heather Flanagan (RFC Series Editor) rse at rfc-editor.org
Thu Sep 11 11:34:32 PDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 9/11/14, 11:01 AM, Russ Housley wrote:
> 
> In the discussion of plan text files,
> draft-flanagan-rfc-framework-00 says:
> 
> o  A Byte Order Mark (BOM) will be added at the start of each file
> 
> 
> This seems like it will hinder transition because many editors
> will display the BOM as a few nonsensical characters.
> 
> The Unicode Standard permits the BOM in UTF-8; however, it does
> not require or even recommend its use.  So, the Unicode standards
> does not seem to be the reason to include a BOM.
> 
> I think we should have a UTF-8 file that is most likely to be 
> consumed by widely deployed plaintext editors.
> 

As you might expect, discussion of whether or not to include a BOM was
an active topic within the design team.  Thanks to testing by Dave
Thaler, we concluded that including a BOM would allow for the widest
support possible for viewing the plain-text files.

His research is included below, with permission:

========
I just ran a test with two UTF-8 files, one with a BOM and one without.

In case you want to try them yourself, they're at

http://research.microsoft.com/~dthaler/Utf8NoBom.txt

http://research.microsoft.com/~dthaler/Utf8WithBom.txt

It includes Latin, Greek, and Cyrillic.

I tried opening them with a bunch of utilities, and browsers (opening
local files not using HTTP), and used browsershots.org to get
screenshots of HTTP access across many browsers and platforms.

Note the HTTP server provides no content encoding headers so it's up
to the app to detect.

I just copied the files to a generic web server, and we may expect
others would do the same with their own I-Ds and RFC mirrors.

Results:

1) Some apps worked fine with both files.  These include things like
notepad, outlook, Word, file explorer, Visual Studio 2012

2) Some apps failed with both files (probably written to be ASCII
only). These include things like Windiff, stevie (a vi clone),
textpad, and the Links browser (on Ubuntu), and the Konquerer browser
(on Ubuntu)

3) Everything else, including almost all browsers, only displayed the
file correctly with the BOM

This included:

Windows apps: Wordpad
Windows using local files (no HTTP): IE, Firefox, Chrome
Windows using HTTP: IE, Firefox, Chrome, Navigator
Mac OSX: Safari, Camino
Debian: Opera, Dillo
Ubuntu: Luakit, Iceape

Conclusion: If we want people to use UTF-8 RFCs and I-Ds with existing
tools and browsers today, any UTF-8 text format needs to include a BOM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJUEes4AAoJEER/xjINbZoGqHkH/1qrUHDoXShnEUWgR3JXWUCt
y6SLjuB+0rWCbP25R8bYKN8WWzH+CZUH/ZcL5sOm9QcshaakVZq9HUM53+YuAAKd
+lWDL9jE9lHYjVVYZNgD3SryCL6t9vbUuZDkKqJiPWYLVxfnK97qTzaRfRHRHWM8
iEcWYk/SB0IB+yUCTzoiuycnF0V/MAWrVNRuWTJfM4YK4/l0Qk/SLmKetl91KiGv
mly1kBKtPvgBztkxnhULJC8oOaMjmmKlmC9Kv1T5ghbky4bU+HQpX4hMFBlkBQKt
VStz7H25SWbDJayvHNqeyLMEWjD2nkIHCALIHcfLmmPJDU5VWS/Fb7prDbBvA14=
=oSFk
-----END PGP SIGNATURE-----


More information about the rfc-interest mailing list