[rfc-i] Data point [Re: Fwd:I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]

Martin Duerst duerst at it.aoyama.ac.jp
Tue Oct 7 01:06:29 PDT 2008

At 16:31 08/10/07, Julian Reschke wrote (answering to Brian E Carpenter):

>> 2. When saved to disk from Firefox, the resulting file does not display
>> correctly in Wordpad (the UTF-8 characters appear in what I guess is the
>> ISO 8859-1 interpretaion). I can't see any options in Wordpad to switch
>> the view to UTF-8, although Wordpad purports to be able to save in Unicode.
>> However, it seems that the file saved by Firefox is not UTF-8.
>The file is saved using UTF-8.
>However, once it's saved to the local filesystem, the character encoding 
>information is lost, thus applications reading the file need to guess.
>A fix for that would be to use an UTF-8 BOM at the start of the file. At 
>least Notepad uses this to decide which encoding to use for decoding.

Yes, this might work. But not all applications like the UTF-8 "BOM".
And it would look ugly on systems that know how to deal with ASCII-only,
which I think in this case, we really want to avoid.

I think that one reason that many programs may not detect UTF-8 is that
in our example (and in some typical uses cases), the percentage of
non-ASCII is very low, and it doesn't appear up front. The result
may have looked differently in some applications if e.g. the
authors list up front contained some serious non-ASCII.

One idea to deal with this might be to include a special boiler-plate
sentence very early on, a sentence that contains some heavy non-ASCII

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     

More information about the rfc-interest mailing list