[rfc-i] draft-flanagan-rfc-framework-00 and byte order mark (BOM)

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Thu Sep 11 20:47:42 PDT 2014



On 2014/09/12 04:03, Heather Flanagan (RFC Series Editor) wrote:

> On 9/11/14, 11:59 AM, Russ Housley wrote:
>> Heather:
>>
>>>> In the discussion of plan text files,
>>>> draft-flanagan-rfc-framework-00 says:
>>>>
>>>> o  A Byte Order Mark (BOM) will be added at the start of each file
>>>>
>>>>
>>>> This seems like it will hinder transition because many editors
>>>> will display the BOM as a few nonsensical characters.

>>> Conclusion: If we want people to use UTF-8 RFCs and I-Ds with existing
>>> tools and browsers today, any UTF-8 text format needs to include a BOM.
>>
>> Thanks.  This is a solid analysis.
>>
>> It seems that the MIME type (text/plain vs. text/html with
> charset=utf-8) becomes quite important.  Since ASCII is a subset of
> UTF-8, maybe the answer is to always include the charset.  Otherwise,
> some database needs to know which charset is appropriate when delivering
> each .txt file.
>>
>>
> Is there a way I don't know about to include charset in a .txt file?  I
> didn't think there was...?

There isn't. One way to help detection (if the encoding is UTF-8) is to 
include actual UTF-8 (not just ASCII) data early on. But not everybody 
has a name that requires non-ASCII.

Essentially, starting with a BOM is the extreme case of "put a non-ASCII 
character early".

Regards,   Martin.


More information about the rfc-interest mailing list