[rfc-i] Data point [Re: Fwd: I-DACTION:draft-hoffman-utf8-rfcs-03.txt]
Martin Duerst
duerst at it.aoyama.ac.jp
Mon Oct 6 21:59:43 PDT 2008
At 08:19 08/10/07, Brian E Carpenter wrote:
>Paul
>
>> In specific, a UTF8ized version of this specific draft can be found at
><http://www.vpnc.org/temp/draft-hoffman-utf8-rfcs-03.utf8>. I'll do the
>same for future drafts.
>6. Just for fun, I ran it through idnits.
>
>** There are 5 instances of lines with non-ascii characters in the document.
>
>** There is 1 instance of too long lines in the document, the longest one
> being 2 characters in excess of 72.
>
>The second warning is interesting - how do you compute the line length when
>there are non-Latin characters around?
Three possibilities (at least), in order of complexity:
- Count bytes
- Count codepoints. For (correct) UTF-8, simply count all bytes in the
ranges 0x00-0x7F and 0xC0-0xFF.
- Use something like http://www.unicode.org/unicode/reports/tr11/
(Unicode Standard Annex #11 East Asian Width).
Regards, Martin.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the rfc-interest
mailing list