[rfc-i] Data point [Re: Fwd: I-DACTION:draft-hoffman-utf8-rfcs-03.txt]

Martin Duerst duerst at it.aoyama.ac.jp
Mon Oct 6 21:59:43 PDT 2008

At 08:19 08/10/07, Brian E Carpenter wrote:
>> In specific, a UTF8ized version of this specific draft can be found at 
><http://www.vpnc.org/temp/draft-hoffman-utf8-rfcs-03.utf8>. I'll do the 
>same for future drafts.

>6. Just for fun, I ran it through idnits.
>** There are 5 instances of lines with non-ascii characters in the document.
>** There is 1 instance of too long lines in the document, the longest one
>   being 2 characters in excess of 72.
>The second warning is interesting - how do you compute the line length when
>there are non-Latin characters around?

Three possibilities (at least), in order of complexity:
- Count bytes
- Count codepoints. For (correct) UTF-8, simply count all bytes in the
  ranges 0x00-0x7F and 0xC0-0xFF.
- Use something like http://www.unicode.org/unicode/reports/tr11/
  (Unicode Standard Annex #11 East Asian Width).

Regards,    Martin.

