[rfc-i] Data point [Re: Fwd:I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]
touch at ISI.EDU
Tue Oct 7 08:37:09 PDT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Julian Reschke wrote:
> Joe Touch wrote:
>>> A UTF-8 form feed *is* an ASCII form feed, so I'm really not sure what
>>> you are testing...
>> Formfeed is 0x0c
>> UTF-8 formfeed is U+000C (though prohibited by draft-hoffman as per
>> Section 2.2; let's assume it's allowed).
> UTF-8 is an encoding of the Unicode character repertoire into an octet
> sequence. The UTF-8 representation of the Unicode form feed character
> *is* the octet 0x0c, just like in ASCII (and that applies to all Unicode
> code points < 128).
>> It appears that Wordpad doesn't recognize U+000C as a page break.
>>>> To type unicode, I typed the following:
>>>> 0 1 2 <alt-x>
>>>> Am I missing something? (I'm new to UTF)...
>>> Hard to say. Can you produce an hex dump of the file? (od -cx)
>> That's a Unix-ism, but yes - I can do that on Windows:
>> 0000000 377 376 a \0 b \0 c \0 \r \0 \n \0 022 \0 \r \0
>> 65279 97 98 99 13 10 18 13
>> 0000020 \n \0 d \0 e \0 f \0 \r \0 \n \0
>> 10 100 101 102 13 10
>> Is that correct UTF-8 for what I typed?
> First of all, it looks like UTF-16 (two octets per code point).
OK; that means that Wordpad doesn't generate UTF-8. That was the one
editor that appeared to support at least some of the cut/paste tests.
If that doesn't work, then we don't have a viable UTF-8 editor
identified for Windows.
> Between the two CR/LF pairs I see something with code point 18; so your
> method of entry apparently didn't produce the desired result.
Hmm. I did what the web pages for UTF-8 said:
(see "how to type in Windows")
It appears that this is incorrect, but further points to the immaturity
of this format.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
More information about the rfc-interest