[rfc-i] Data point [Re: Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]

Julian Reschke julian.reschke at gmx.de
Tue Oct 7 00:31:42 PDT 2008


Hi,

just a few comments/questions in-line:

Brian E Carpenter wrote:
> Thanks. My results on Windows XP:
> 
> 1. It displays correctly using Firefox 2.0.0.7 with View/Character Encoding/UTF-8.

Did you need to select the encoding manually? That shouldn't be 
necessary, as the content is served with the correct content type header:

   Content-Type: text/plain; charset=utf-8

> 2. When saved to disk from Firefox, the resulting file does not display
> correctly in Wordpad (the UTF-8 characters appear in what I guess is the
> ISO 8859-1 interpretaion). I can't see any options in Wordpad to switch
> the view to UTF-8, although Wordpad purports to be able to save in Unicode.
> However, it seems that the file saved by Firefox is not UTF-8.

The file is saved using UTF-8.

However, once it's saved to the local filesystem, the character encoding 
information is lost, thus applications reading the file need to guess.

A fix for that would be to use an UTF-8 BOM at the start of the file. At 
least Notepad uses this to decide which encoding to use for decoding.

> 2a. When the file is viewed in Notepad, the UTF-8 characters are correct.
> (However, since Notepad doesn't understand Unix-style carriage control, the
> whole draft is displayed as 17 extremely long lines.)

Remark: same as with ASCII.

> 3. When I cut and paste from Firefox into Wordpad, the UTF-8 characters
> display correctly. However, there is a spontaneous change of font
> from Arial to SimSun at the first UTF-8 character.
> 
> 3a. Wordpad by default offers to save the file in RTF format, which is
> proprietary. When I override this and tell Wordpad to save in Unicode,
> the resulting TXT file does display correctly when re-opened with
> Wordpad or Notepad. However, Firefox displays scrambled egg when told
> to decode it as UTF-8; it turns out that Wordpad saved it in UTF-16.

Over here, Firefox (3) detected the encoding properly. Maybe for some 
reasons you have turned off automatic encoding detection in Firefox?

> 4. When I cut and paste from Firefox into Notepad,  the UTF-8 characters
> display correctly (and the font is Courier throughout).
> 
> 4a. Notepad by default offers to save the file in ASCII-only. When I
> tell Notepad to save as UTF-8, the resulting file displays correctly
> with Wordpad, Notepad and Firefox/UTF-8.

...because Notepad *does* insert a BOM.

> Thus, using the least proprietary tools I have available on XP,
> there only seems to be one path that saves a genuine UTF-8 file:
> cut and paste from Firefox into Notepad, and then save as UTF-8.
> The other paths I tried result in non-UTF-8 files.

That is incorrect. They result in text files encoded using UTF-8, but 
with the information about the encoding being lost. Thus recipients need 
to guess.

> ...

BR, Julian


More information about the rfc-interest mailing list