[rfc-i] Character sets, was Comments on draft-iab-rfcformat

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Wed Dec 19 01:22:11 PST 2012

On 2012/12/19 10:25, John R Levine wrote:

> While I agree that it would probably not be a great idea to fill our
> RFCs with glyphs used only in classical Tibetan, it wouldn't be hard to
> pick a profile of commonly used Unicode characters, e.g. the ones that
> IDNA2008 allows, and tell people that if their display device doesn't
> handle them, it's time to upgrade, or if they just can't, to use the
> ugly downgraded version of the documents.

IDNA2008 allows probably too many characters in the sense that even very 
very recently encoded characters are allowed, but on the other hand not 
enough characters because it doesn't include any symbols or punctuation, 
which may be important in some examples.

In general, I'd strongly suggest to not try to come up with a complete 
rule that can be checked mechanically, because we'll either spend too 
much time working everything out or spend not enough time and get it 
wrong. It's much easier to develop general usage guidelines, work on the 
assumption that the average author won't include more than the necessary 
amount of non-Unicode code points, and have a tool or two to allow quick 
and easy checks.

I could for example imagine nit checking warnings listing all the 
non-ASCII Unicode code points, potentially with their "age" (time since 
encoding), with a text saying something like "check if this is used in 
an example or in a name, otherwise fix". I even volunteer to write such 
a tool/component if I can do it in Ruby, or to help somebody who wants 
to write it in Python.

Regards,    Martin.

More information about the rfc-interest mailing list