[rfc-i] Data point [Re: Fwd:I-D ACTION:draft-hoffman-utf8-rfcs-03.txt]

Julian Reschke julian.reschke at gmx.de
Tue Oct 7 08:27:25 PDT 2008


Joe Touch wrote:
>> A UTF-8 form feed *is* an ASCII form feed, so I'm really not sure what
>> you are testing...
> 
> Formfeed is 0x0c
> 
> UTF-8 formfeed is U+000C (though prohibited by draft-hoffman as per
> Section 2.2; let's assume it's allowed).

UTF-8 is an encoding of the Unicode character repertoire into an octet 
sequence. The UTF-8 representation of the Unicode form feed character 
*is* the octet 0x0c, just like in ASCII (and that applies to all Unicode 
code points < 128).

> It appears that Wordpad doesn't recognize U+000C as a page break.
> 
>>> To type unicode, I typed the following:
>>>
>>>     0 1 2 <alt-x>
>>>
>>> Am I missing something? (I'm new to UTF)...
>>> ...
>> Hard to say. Can you produce an hex dump of the file? (od -cx)
> 
> That's a Unix-ism, but yes - I can do that on Windows:
> 
> 0000000 377 376   a  \0   b  \0   c  \0  \r  \0  \n  \0 022  \0  \r  \0
>         65279    97    98    99    13    10    18    13
> 0000020  \n  \0   d  \0   e  \0   f  \0  \r  \0  \n  \0
>            10   100   101   102    13    10
> 0000034
> 
> Is that correct UTF-8 for what I typed?

First of all, it looks like UTF-16 (two octets per code point).

Between the two CR/LF pairs I see something with code point 18; so your 
method of entry apparently didn't produce the desired result.

BR, Julian



More information about the rfc-interest mailing list