[rfc-i] Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt

Martin Duerst duerst at it.aoyama.ac.jp
Mon Oct 20 03:54:33 PDT 2008

At 19:11 08/10/20, Julian Reschke wrote:
>Simon Josefsson wrote:
>> Julian Reschke <julian.reschke at gmx.de> writes:
>>> 1) "text/plain; charset=utf-8" doesn't work well once the file is stored 
>>> locally in the file system, and the encoding information is lost. The 
>>> proper fix for his is to use a UTF-8 BOM, which enables at least the 
>>> standard Windows applications (Notepad/Wordpad) to do the right thing.
>> I think UTF-8 BOM is a poor solution to almost any problem.
>> Does modern Notepad/Wordpad require UTF-8 BOM to parse UTF-8 text?
>If by "modern" you mean "Vista", I don't know. The versions I have 
>default to the system locale if no BOM is present (I think).

My understanding was that if there is significant text upfront
that makes it possible/easy to identify the data as UTF-8, then
a BOM is not needed. That understanding is based on WinXP, not Vista.
Also, it's not the result of rigorous tests.

In our case, "significant non-ASCII text upfront" will usually
not apply; a single or a few non-ASCII characters in an author's
name don't cut it, and there is too much boilerplate upfront.

Regards,   Martin.

>> I think we should just say that people should upgrade to tools that
>> detect and handle UTF-8 text without need for UTF-8 BOM's.
>That would be guessing based on statistics, right?
>BR, Julian
>rfc-interest mailing list
>rfc-interest at rfc-editor.org

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     

More information about the rfc-interest mailing list