[rfc-i] Answering Julian [Data point [Re: Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]]

Julian Reschke julian.reschke at gmx.de
Tue Oct 7 16:34:08 PDT 2008

Brian E Carpenter wrote:
> ...
>> The file is saved using UTF-8.
>> However, once it's saved to the local filesystem, the character encoding
>> information is lost, thus applications reading the file need to guess.
>> A fix for that would be to use an UTF-8 BOM at the start of the file. At
>> least Notepad uses this to decide which encoding to use for decoding.
> OK, so that is a defect in the way browsers implement Save As, I guess.

Well, I'm not sure everybody considers this a bug. As a matter of fact, 
the user agent did save what it received over the network.

The root cause is that the file format neither has a fixed character 
encoding, nor allows embedding the encoding (such as it would be the 
case with XML or HTML).

> ...
>>> Thus, using the least proprietary tools I have available on XP,
>>> there only seems to be one path that saves a genuine UTF-8 file:
>>> cut and paste from Firefox into Notepad, and then save as UTF-8.
>>> The other paths I tried result in non-UTF-8 files.
>> That is incorrect. They result in text files encoded using UTF-8, but
>> with the information about the encoding being lost. Thus recipients need
>> to guess.
> OK; of course the practical effect is the same. Anyway, message to
> Redmond: please get the UTF-8 handling 100% right in Windows N+1.
> ...

I don't see what they can do except for making the charset information 
part of the file filesystem metadata...

BR, Julian

More information about the rfc-interest mailing list