[rfc-i] draft-flanagan-rfc-framework-00 and byte order mark (BOM)

Andrew G. Malis agmalis at gmail.com
Fri Sep 12 06:27:57 PDT 2014


Out of curiosity, I just tried the two files on my Mac and had similar
results to Dave, all of the browsers I tried (Safari, Firefox, Chrome)
failed without the BOM but succeeded with the BOM. Interestingly, every
text editor I tried such as TextEdit, TextWrangler, TextMate, emacs, and vi
all succeeded with both files.

Cheers,
Andy



On Thu, Sep 11, 2014 at 2:34 PM, Heather Flanagan (RFC Series Editor) <
rse at rfc-editor.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 9/11/14, 11:01 AM, Russ Housley wrote:
> >
> > In the discussion of plan text files,
> > draft-flanagan-rfc-framework-00 says:
> >
> > o  A Byte Order Mark (BOM) will be added at the start of each file
> >
> >
> > This seems like it will hinder transition because many editors
> > will display the BOM as a few nonsensical characters.
> >
> > The Unicode Standard permits the BOM in UTF-8; however, it does
> > not require or even recommend its use.  So, the Unicode standards
> > does not seem to be the reason to include a BOM.
> >
> > I think we should have a UTF-8 file that is most likely to be
> > consumed by widely deployed plaintext editors.
> >
>
> As you might expect, discussion of whether or not to include a BOM was
> an active topic within the design team.  Thanks to testing by Dave
> Thaler, we concluded that including a BOM would allow for the widest
> support possible for viewing the plain-text files.
>
> His research is included below, with permission:
>
> ========
> I just ran a test with two UTF-8 files, one with a BOM and one without.
>
> In case you want to try them yourself, they're at
>
> http://research.microsoft.com/~dthaler/Utf8NoBom.txt
>
> http://research.microsoft.com/~dthaler/Utf8WithBom.txt
>
> It includes Latin, Greek, and Cyrillic.
>
> I tried opening them with a bunch of utilities, and browsers (opening
> local files not using HTTP), and used browsershots.org to get
> screenshots of HTTP access across many browsers and platforms.
>
> Note the HTTP server provides no content encoding headers so it's up
> to the app to detect.
>
> I just copied the files to a generic web server, and we may expect
> others would do the same with their own I-Ds and RFC mirrors.
>
> Results:
>
> 1) Some apps worked fine with both files.  These include things like
> notepad, outlook, Word, file explorer, Visual Studio 2012
>
> 2) Some apps failed with both files (probably written to be ASCII
> only). These include things like Windiff, stevie (a vi clone),
> textpad, and the Links browser (on Ubuntu), and the Konquerer browser
> (on Ubuntu)
>
> 3) Everything else, including almost all browsers, only displayed the
> file correctly with the BOM
>
> This included:
>
> Windows apps: Wordpad
> Windows using local files (no HTTP): IE, Firefox, Chrome
> Windows using HTTP: IE, Firefox, Chrome, Navigator
> Mac OSX: Safari, Camino
> Debian: Opera, Dillo
> Ubuntu: Luakit, Iceape
>
> Conclusion: If we want people to use UTF-8 RFCs and I-Ds with existing
> tools and browsers today, any UTF-8 text format needs to include a BOM.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJUEes4AAoJEER/xjINbZoGqHkH/1qrUHDoXShnEUWgR3JXWUCt
> y6SLjuB+0rWCbP25R8bYKN8WWzH+CZUH/ZcL5sOm9QcshaakVZq9HUM53+YuAAKd
> +lWDL9jE9lHYjVVYZNgD3SryCL6t9vbUuZDkKqJiPWYLVxfnK97qTzaRfRHRHWM8
> iEcWYk/SB0IB+yUCTzoiuycnF0V/MAWrVNRuWTJfM4YK4/l0Qk/SLmKetl91KiGv
> mly1kBKtPvgBztkxnhULJC8oOaMjmmKlmC9Kv1T5ghbky4bU+HQpX4hMFBlkBQKt
> VStz7H25SWbDJayvHNqeyLMEWjD2nkIHCALIHcfLmmPJDU5VWS/Fb7prDbBvA14=
> =oSFk
> -----END PGP SIGNATURE-----
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20140912/e11f533d/attachment.html>


More information about the rfc-interest mailing list