[rfc-i] open issues: character encoding of names

Andrew Sullivan ajs at anvilwalrusden.com
Thu May 31 11:54:13 PDT 2012

On Thu, May 31, 2012 at 11:14:30AM -0700, Joe Touch wrote:
> I would have preferred if the series adds features sparingly and of
> absolute necessity - in general.

That position is reasonable, but we probably need to make a
distinction between what we might call "protocol" and "policy".

That is, the current format of RFCs is encoded in ASCII, period.  This
is a protocol limitation.  The suggestion here is to permit Unicode --
specifically UTF-8 -- as the encoding.  

Such a change might be made while yet adopting a conservative policy
about where code points outside the ASCII range may be employed.  I
would regard that as a decision to be made by the RSE, but as near as
I can tell she is sensible and conservative about these things, and
not apt to make changes that are not easy to justify.

> I hope that internationalize demail addresses are provided in two
> forms - internationalized and their ASCII equivalents (e.g., IDNA),
> at least for the forseeable future.

EAI, of course, does not provide the same 1:1 mapping that IDNA does,
so this stricture might be harder to require (though I expect that in
the immediate future someone who doesn't have an ASCII address will,
for practical purposes, not really have an email address at all).
It's always interesting to me, however, that people argue that the
A-label form of IDNA labels is somehow more usable.  I find the two
labels 台湾 and xn--kprw13d to have almost exactly the same
comprehensibility.  I am personally more able to tell the first of
those apart from 中国 at a glance than I am able to tell apart the
second and xn--fiqs8s.  I do not speak Chinese, and so this is
probably an accident of the particular examples available.



Andrew Sullivan
ajs at anvilwalrusden.com

More information about the rfc-interest mailing list