[rfc-i] Data point [Re: Fwd:I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]
Joe Touch
touch at ISI.EDU
Tue Oct 7 08:37:09 PDT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Julian Reschke wrote:
> Joe Touch wrote:
>>> A UTF-8 form feed *is* an ASCII form feed, so I'm really not sure what
>>> you are testing...
>>
>> Formfeed is 0x0c
>>
>> UTF-8 formfeed is U+000C (though prohibited by draft-hoffman as per
>> Section 2.2; let's assume it's allowed).
>
> UTF-8 is an encoding of the Unicode character repertoire into an octet
> sequence. The UTF-8 representation of the Unicode form feed character
> *is* the octet 0x0c, just like in ASCII (and that applies to all Unicode
> code points < 128).
>
>> It appears that Wordpad doesn't recognize U+000C as a page break.
>>
>>>> To type unicode, I typed the following:
>>>>
>>>> 0 1 2 <alt-x>
>>>>
>>>> Am I missing something? (I'm new to UTF)...
>>>> ...
>>> Hard to say. Can you produce an hex dump of the file? (od -cx)
>>
>> That's a Unix-ism, but yes - I can do that on Windows:
>>
>> 0000000 377 376 a \0 b \0 c \0 \r \0 \n \0 022 \0 \r \0
>> 65279 97 98 99 13 10 18 13
>> 0000020 \n \0 d \0 e \0 f \0 \r \0 \n \0
>> 10 100 101 102 13 10
>> 0000034
>>
>> Is that correct UTF-8 for what I typed?
>
> First of all, it looks like UTF-16 (two octets per code point).
OK; that means that Wordpad doesn't generate UTF-8. That was the one
editor that appeared to support at least some of the cut/paste tests.
If that doesn't work, then we don't have a viable UTF-8 editor
identified for Windows.
> Between the two CR/LF pairs I see something with code point 18; so your
> method of entry apparently didn't produce the desired result.
Hmm. I did what the web pages for UTF-8 said:
http://www.fileformat.info/info/unicode/char/000c/index.htm
(see "how to type in Windows")
It appears that this is incorrect, but further points to the immaturity
of this format.
Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjrgiUACgkQE5f5cImnZrvgcQCfXVbRQ7E1qWtuLyVkWQr2O++h
JnsAni8tRdBCNBKDCTb3qsr9gs4rlYiT
=dnIu
-----END PGP SIGNATURE-----
More information about the rfc-interest
mailing list