[rfc-i] Data point [Re: Fwd:I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]
julian.reschke at gmx.de
Tue Oct 7 08:27:25 PDT 2008
Joe Touch wrote:
>> A UTF-8 form feed *is* an ASCII form feed, so I'm really not sure what
>> you are testing...
> Formfeed is 0x0c
> UTF-8 formfeed is U+000C (though prohibited by draft-hoffman as per
> Section 2.2; let's assume it's allowed).
UTF-8 is an encoding of the Unicode character repertoire into an octet
sequence. The UTF-8 representation of the Unicode form feed character
*is* the octet 0x0c, just like in ASCII (and that applies to all Unicode
code points < 128).
> It appears that Wordpad doesn't recognize U+000C as a page break.
>>> To type unicode, I typed the following:
>>> 0 1 2 <alt-x>
>>> Am I missing something? (I'm new to UTF)...
>> Hard to say. Can you produce an hex dump of the file? (od -cx)
> That's a Unix-ism, but yes - I can do that on Windows:
> 0000000 377 376 a \0 b \0 c \0 \r \0 \n \0 022 \0 \r \0
> 65279 97 98 99 13 10 18 13
> 0000020 \n \0 d \0 e \0 f \0 \r \0 \n \0
> 10 100 101 102 13 10
> Is that correct UTF-8 for what I typed?
First of all, it looks like UTF-16 (two octets per code point).
Between the two CR/LF pairs I see something with code point 18; so your
method of entry apparently didn't produce the desired result.
More information about the rfc-interest