[rfc-i] Answering Julian [Data point [Re: Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]]
julian.reschke at gmx.de
Tue Oct 7 16:34:08 PDT 2008
Brian E Carpenter wrote:
>> The file is saved using UTF-8.
>> However, once it's saved to the local filesystem, the character encoding
>> information is lost, thus applications reading the file need to guess.
>> A fix for that would be to use an UTF-8 BOM at the start of the file. At
>> least Notepad uses this to decide which encoding to use for decoding.
> OK, so that is a defect in the way browsers implement Save As, I guess.
Well, I'm not sure everybody considers this a bug. As a matter of fact,
the user agent did save what it received over the network.
The root cause is that the file format neither has a fixed character
encoding, nor allows embedding the encoding (such as it would be the
case with XML or HTML).
>>> Thus, using the least proprietary tools I have available on XP,
>>> there only seems to be one path that saves a genuine UTF-8 file:
>>> cut and paste from Firefox into Notepad, and then save as UTF-8.
>>> The other paths I tried result in non-UTF-8 files.
>> That is incorrect. They result in text files encoded using UTF-8, but
>> with the information about the encoding being lost. Thus recipients need
>> to guess.
> OK; of course the practical effect is the same. Anyway, message to
> Redmond: please get the UTF-8 handling 100% right in Windows N+1.
I don't see what they can do except for making the charset information
part of the file filesystem metadata...
More information about the rfc-interest