[rfc-i] Data point [Re: Fwd:I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]
Julian Reschke
julian.reschke at gmx.de
Tue Oct 7 08:27:25 PDT 2008
Joe Touch wrote:
>> A UTF-8 form feed *is* an ASCII form feed, so I'm really not sure what
>> you are testing...
>
> Formfeed is 0x0c
>
> UTF-8 formfeed is U+000C (though prohibited by draft-hoffman as per
> Section 2.2; let's assume it's allowed).
UTF-8 is an encoding of the Unicode character repertoire into an octet
sequence. The UTF-8 representation of the Unicode form feed character
*is* the octet 0x0c, just like in ASCII (and that applies to all Unicode
code points < 128).
> It appears that Wordpad doesn't recognize U+000C as a page break.
>
>>> To type unicode, I typed the following:
>>>
>>> 0 1 2 <alt-x>
>>>
>>> Am I missing something? (I'm new to UTF)...
>>> ...
>> Hard to say. Can you produce an hex dump of the file? (od -cx)
>
> That's a Unix-ism, but yes - I can do that on Windows:
>
> 0000000 377 376 a \0 b \0 c \0 \r \0 \n \0 022 \0 \r \0
> 65279 97 98 99 13 10 18 13
> 0000020 \n \0 d \0 e \0 f \0 \r \0 \n \0
> 10 100 101 102 13 10
> 0000034
>
> Is that correct UTF-8 for what I typed?
First of all, it looks like UTF-16 (two octets per code point).
Between the two CR/LF pairs I see something with code point 18; so your
method of entry apparently didn't produce the desired result.
BR, Julian
More information about the rfc-interest
mailing list