[rfc-i] Data point [Re: Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]
Brian E Carpenter
brian.e.carpenter at gmail.com
Mon Oct 6 16:19:46 PDT 2008
> In specific, a UTF8ized version of this specific draft can be found at <http://www.vpnc.org/temp/draft-hoffman-utf8-rfcs-03.utf8>. I'll do the same for future drafts.
Thanks. My results on Windows XP:
1. It displays correctly using Firefox 18.104.22.168 with View/Character Encoding/UTF-8.
2. When saved to disk from Firefox, the resulting file does not display
correctly in Wordpad (the UTF-8 characters appear in what I guess is the
ISO 8859-1 interpretaion). I can't see any options in Wordpad to switch
the view to UTF-8, although Wordpad purports to be able to save in Unicode.
However, it seems that the file saved by Firefox is not UTF-8.
2a. When the file is viewed in Notepad, the UTF-8 characters are correct.
(However, since Notepad doesn't understand Unix-style carriage control, the
whole draft is displayed as 17 extremely long lines.)
3. When I cut and paste from Firefox into Wordpad, the UTF-8 characters
display correctly. However, there is a spontaneous change of font
from Arial to SimSun at the first UTF-8 character.
3a. Wordpad by default offers to save the file in RTF format, which is
proprietary. When I override this and tell Wordpad to save in Unicode,
the resulting TXT file does display correctly when re-opened with
Wordpad or Notepad. However, Firefox displays scrambled egg when told
to decode it as UTF-8; it turns out that Wordpad saved it in UTF-16.
4. When I cut and paste from Firefox into Notepad, the UTF-8 characters
display correctly (and the font is Courier throughout).
4a. Notepad by default offers to save the file in ASCII-only. When I
tell Notepad to save as UTF-8, the resulting file displays correctly
with Wordpad, Notepad and Firefox/UTF-8.
Thus, using the least proprietary tools I have available on XP,
there only seems to be one path that saves a genuine UTF-8 file:
cut and paste from Firefox into Notepad, and then save as UTF-8.
The other paths I tried result in non-UTF-8 files.
5. I then mailed the good UTF-8 file to myself as an attachment,
from a gmail account to a university account. It survived the trip.
6. Just for fun, I ran it through idnits.
** There are 5 instances of lines with non-ascii characters in the document.
** There is 1 instance of too long lines in the document, the longest one
being 2 characters in excess of 72.
The second warning is interesting - how do you compute the line length when
there are non-Latin characters around?
More information about the rfc-interest