[rfc-i] draft-flanagan-rfc-framework-00 and byte order mark (BOM)
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Thu Sep 11 20:47:42 PDT 2014
On 2014/09/12 04:03, Heather Flanagan (RFC Series Editor) wrote:
> On 9/11/14, 11:59 AM, Russ Housley wrote:
>>>> In the discussion of plan text files,
>>>> draft-flanagan-rfc-framework-00 says:
>>>> o A Byte Order Mark (BOM) will be added at the start of each file
>>>> This seems like it will hinder transition because many editors
>>>> will display the BOM as a few nonsensical characters.
>>> Conclusion: If we want people to use UTF-8 RFCs and I-Ds with existing
>>> tools and browsers today, any UTF-8 text format needs to include a BOM.
>> Thanks. This is a solid analysis.
>> It seems that the MIME type (text/plain vs. text/html with
> charset=utf-8) becomes quite important. Since ASCII is a subset of
> UTF-8, maybe the answer is to always include the charset. Otherwise,
> some database needs to know which charset is appropriate when delivering
> each .txt file.
> Is there a way I don't know about to include charset in a .txt file? I
> didn't think there was...?
There isn't. One way to help detection (if the encoding is UTF-8) is to
include actual UTF-8 (not just ASCII) data early on. But not everybody
has a name that requires non-ASCII.
Essentially, starting with a BOM is the extreme case of "put a non-ASCII
More information about the rfc-interest