[rfc-i] Data point [Re: Fwd: I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]

Joe Touch touch at ISI.EDU
Mon Oct 6 21:24:11 PDT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, all,

I tried this on Vista, and here's what I found:

Brian E Carpenter wrote:
> Paul
> 
>> In specific, a UTF8ized version of this specific draft can be found at <http://www.vpnc.org/temp/draft-hoffman-utf8-rfcs-03.utf8>. I'll do the same for future drafts.
> 
> Thanks. My results on Windows XP:
> 
> 1. It displays correctly using Firefox 2.0.0.7 with View/Character Encoding/UTF-8.

They look OK in Firefox 3.03, IE 7.06, and Safari 3.1.2

> 2. When saved to disk from Firefox, the resulting file does not display
> correctly in Wordpad (the UTF-8 characters appear in what I guess is the
> ISO 8859-1 interpretaion). I can't see any options in Wordpad to switch
> the view to UTF-8, although Wordpad purports to be able to save in Unicode.
> However, it seems that the file saved by Firefox is not UTF-8.

Wordpad: fails on all three saved files
Xemacs 21.4.21 (Dec 2006 version): fails on all three saved files

> 2a. When the file is viewed in Notepad, the UTF-8 characters are correct.
> (However, since Notepad doesn't understand Unix-style carriage control, the
> whole draft is displayed as 17 extremely long lines.)

Notepad: firefox and safari saved files understand UTF-8,
	but IE-saved does not

> 3. When I cut and paste from Firefox into Wordpad, the UTF-8 characters
> display correctly. However, there is a spontaneous change of font
> from Arial to SimSun at the first UTF-8 character.

I get the same kind of font-changing behavior (I didn't verify which
fonts, but it doesn't matter to me) for cut/paste from both Firefox and
Safari into Wordpad.

Interestingly, copy/paste from IE into Wordpad worked fine.

> 3a. Wordpad by default offers to save the file in RTF format, which is
> proprietary. When I override this and tell Wordpad to save in Unicode,
> the resulting TXT file does display correctly when re-opened with
> Wordpad or Notepad. However, Firefox displays scrambled egg when told
> to decode it as UTF-8; it turns out that Wordpad saved it in UTF-16.
> 
> 4. When I cut and paste from Firefox into Notepad,  the UTF-8 characters
> display correctly (and the font is Courier throughout).

I get the same behavior (works fine) for Firefox, Safari, and IE pasted
into Notepad.

When I cut/paste into Xemacs, none worked (they paste ?? where the UTF-8
chars were).

They all cut/paste fine into Word.

> 4a. Notepad by default offers to save the file in ASCII-only. When I
> tell Notepad to save as UTF-8, the resulting file displays correctly
> with Wordpad, Notepad and Firefox/UTF-8.

So for the "save as" test I used by Word template procedure, which is to
paste the text into Word, then print using the "Generic/Text-Only"
printer. That printer outputs a single "." for each UTF-8 character
position (it correctly interprets them as a single position, but won't
print out a UTF-8 character).

> Thus, using the least proprietary tools I have available on XP,
> there only seems to be one path that saves a genuine UTF-8 file:
> cut and paste from Firefox into Notepad, and then save as UTF-8.
> The other paths I tried result in non-UTF-8 files.
> 
> 5. I then mailed the good UTF-8 file to myself as an attachment,
> from a gmail account to a university account. It survived the trip.
> 
> 6. Just for fun, I ran it through idnits.

I did not run it through 2-Word-post-v2.0.pl, the post-processor for the
Word template; I suspect I would have similar results to below. It
presumably would not be difficult to extend that script to understand
UTF-8 and count characters correctly, but given the above, there's no
point -- since Generic/Text-Only won't spit it out, it shouldn't be
coming in...

Given the problems with things like Wordpad, Xemacs, and
Generic/Text-Only output - none of which are all that ancient - I agree
with Dave about the suitability of UTF-8 as a replacement for ASCII in
RFCs and IDs.

Joe

> 
> ** There are 5 instances of lines with non-ascii characters in the document.
> 
> ** There is 1 instance of too long lines in the document, the longest one
>    being 2 characters in excess of 72.
> 
> The second warning is interesting - how do you compute the line length when
> there are non-Latin characters around?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjq5GsACgkQE5f5cImnZrtC8ACg8ZqKFqGY6hzpc2G7XIi+QVI3
8uQAoJx9u1Cah6yBXXSOy+0ku+E0LoKR
=VA+R
-----END PGP SIGNATURE-----


More information about the rfc-interest mailing list