[rfc-i] Character sets, was Comments on draft-iab-rfcformat
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Wed Dec 19 01:22:11 PST 2012
On 2012/12/19 10:25, John R Levine wrote:
> While I agree that it would probably not be a great idea to fill our
> RFCs with glyphs used only in classical Tibetan, it wouldn't be hard to
> pick a profile of commonly used Unicode characters, e.g. the ones that
> IDNA2008 allows, and tell people that if their display device doesn't
> handle them, it's time to upgrade, or if they just can't, to use the
> ugly downgraded version of the documents.
IDNA2008 allows probably too many characters in the sense that even very
very recently encoded characters are allowed, but on the other hand not
enough characters because it doesn't include any symbols or punctuation,
which may be important in some examples.
In general, I'd strongly suggest to not try to come up with a complete
rule that can be checked mechanically, because we'll either spend too
much time working everything out or spend not enough time and get it
wrong. It's much easier to develop general usage guidelines, work on the
assumption that the average author won't include more than the necessary
amount of non-Unicode code points, and have a tool or two to allow quick
and easy checks.
I could for example imagine nit checking warnings listing all the
non-ASCII Unicode code points, potentially with their "age" (time since
encoding), with a text saying something like "check if this is used in
an example or in a name, otherwise fix". I even volunteer to write such
a tool/component if I can do it in Ruby, or to help somebody who wants
to write it in Python.
More information about the rfc-interest