[rfc-i] Do we need romanization metadata? (Re: [IAB Trac] #270: Resources associated with allowing UTF-8 in RFCs (Section 3.3))

Nico Williams nico at cryptonector.com
Mon Mar 4 11:31:07 PST 2013


On Mon, Mar 4, 2013 at 1:18 PM, Heather Flanagan (RFC Series Editor)
<rse at rfc-editor.org> wrote:
> On 2/27/13 12:51 AM, "Martin J. Dürst" wrote:
>> 3) The distinction between ASCII and non-ASCII is a remainder from the
>> old format, not what's actually needed here. No decent publisher (e.g.
>> ACM, IEEE) or other standards organization (e.g. W3) would do things
>> such as:
>>   Patrick Fältström (Patrick Faltstrom)
>>   Martin Dürst (Martin Duerst)

Speaking of which...  for non-Latin-derived scripts it'd be nice to
have [optional] romanization of author names in xml2rfc and related.
That way it'd be possible to have names appear in scripts that many
readers might not understand and also in Latin scripts so that
non-speakers/readers of the authors' names' scripts could still
address them in correspondence.

(Romanization of Chinese, Japanese, and other names is quite common.
Heck, they need not be romanized as such -- they can simply be Latin
aliases for the author, something that I've seen to be quite common
among some of my colleagues.)

And not just author names, but postal addresses.

Internationalized e-mail addresses are fine: they can be copy-pasted,
or clicked on.  But it might be desirable to allow for alternative
e-mail addresses.

If I may be so bold, links to audio indicating how to pronounce author
(and place) names would be nice to have as well!

> We've discussed this somewhat over the last year, whether the discussion
> should be around Latin and non-Latin characters or whether this should
> be about the inclusion of additional characters from the UTF-8 character
> encoding.  My understanding is that the Latin-script alphabet is based
> on ASCII, and non-Latin is broader than saying "UTF-8".  I think
> Latin/non-Latin is too broad a topic, whereas ASCII and UTF-8 more
> clearly bound what we should be able to handle at this point.

How does speaking of ASCII and UTF-8 bound anything?  UTF-8 implies
Unicode, and lacking constraints on our side it implies all of the
power of Unicode.  Mind you, I don't think we need explicit
constraints here.  What is reasonable for use in informative text will
be something best left to be determined on a case-by-case basis, IMO.
The main problem will be in the author metadata; see above.

Nico
--


More information about the rfc-interest mailing list