[rfc-i] Answering Julian [Data point [Re: Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]]

Brian E Carpenter brian.e.carpenter at gmail.com
Tue Oct 7 13:54:17 PDT 2008


Julian,

On 2008-10-07 20:31, Julian Reschke wrote:
> Hi,
> 
> just a few comments/questions in-line:
> 
> Brian E Carpenter wrote:
>> Thanks. My results on Windows XP:
>>
>> 1. It displays correctly using Firefox 2.0.0.7 with View/Character
>> Encoding/UTF-8.
> 
> Did you need to select the encoding manually? That shouldn't be
> necessary, as the content is served with the correct content type header:
> 
>   Content-Type: text/plain; charset=utf-8

Yes, the automatic detection worked, but I set the encoding in Firefox
manually to be certain what I was looking at.

> 
>> 2. When saved to disk from Firefox, the resulting file does not display
>> correctly in Wordpad (the UTF-8 characters appear in what I guess is the
>> ISO 8859-1 interpretaion). I can't see any options in Wordpad to switch
>> the view to UTF-8, although Wordpad purports to be able to save in
>> Unicode.
>> However, it seems that the file saved by Firefox is not UTF-8.
> 
> The file is saved using UTF-8.
> 
> However, once it's saved to the local filesystem, the character encoding
> information is lost, thus applications reading the file need to guess.
> 
> A fix for that would be to use an UTF-8 BOM at the start of the file. At
> least Notepad uses this to decide which encoding to use for decoding.

OK, so that is a defect in the way browsers implement Save As, I guess.

> 
>> 2a. When the file is viewed in Notepad, the UTF-8 characters are correct.
>> (However, since Notepad doesn't understand Unix-style carriage
>> control, the
>> whole draft is displayed as 17 extremely long lines.)
> 
> Remark: same as with ASCII.

Correct, I should have said that. It's the LFCR/CRLF/CR/LF problem.
> 
>> 3. When I cut and paste from Firefox into Wordpad, the UTF-8 characters
>> display correctly. However, there is a spontaneous change of font
>> from Arial to SimSun at the first UTF-8 character.
>>
>> 3a. Wordpad by default offers to save the file in RTF format, which is
>> proprietary. When I override this and tell Wordpad to save in Unicode,
>> the resulting TXT file does display correctly when re-opened with
>> Wordpad or Notepad. However, Firefox displays scrambled egg when told
>> to decode it as UTF-8; it turns out that Wordpad saved it in UTF-16.
> 
> Over here, Firefox (3) detected the encoding properly. Maybe for some
> reasons you have turned off automatic encoding detection in Firefox?

Yes, for that test, because I was trying to be certain what I was looking at.
I could have gone to cygwin/od but that's too much like hard work ;-)

> 
>> 4. When I cut and paste from Firefox into Notepad,  the UTF-8 characters
>> display correctly (and the font is Courier throughout).
>>
>> 4a. Notepad by default offers to save the file in ASCII-only. When I
>> tell Notepad to save as UTF-8, the resulting file displays correctly
>> with Wordpad, Notepad and Firefox/UTF-8.
> 
> ...because Notepad *does* insert a BOM.

Ah ha.

> 
>> Thus, using the least proprietary tools I have available on XP,
>> there only seems to be one path that saves a genuine UTF-8 file:
>> cut and paste from Firefox into Notepad, and then save as UTF-8.
>> The other paths I tried result in non-UTF-8 files.
> 
> That is incorrect. They result in text files encoded using UTF-8, but
> with the information about the encoding being lost. Thus recipients need
> to guess.

OK; of course the practical effect is the same. Anyway, message to
Redmond: please get the UTF-8 handling 100% right in Windows N+1.

Thanks

    Brian


More information about the rfc-interest mailing list