Martin J. Dürst
duerst at it.aoyama.ac.jp
Mon Mar 7 02:27:53 PST 2016
Trying to tie some loose ends; if they are already tied up, please ignore.
On 2016/02/24 09:16, Paul Hoffman wrote:
> On 23 Feb 2016, at 16:00, Heather Flanagan (RFC Series Editor) wrote:
>> For your example, it would be pretty simple:
>> P. Hoffman
>> Éxämple Corp.
>> See draft-iab-rfc-nonascii-00.txt, Section 3.2.
>> " Person names may appear in several places within an RFC. In all
>> cases, valid Unicode is required. For names that include characters
>> outside of the Unicode Latin and Latin Extended script,
There's no such thing as "Latin Extended script". This should change to
e.g. "outside of the Unicode Latin script" or "outside of the Unicode
>> an author-
>> provided, ASCII-only identifier is required to assist in search and
>> indexing of the document."
> Good catch, but I gave a bad example. How would you propose that the
> display of organization names be for:
> <author initials="P." surname="Hoffman" fullname="Paul Hoffman">
> <organization ascii="Example Corp.">している Corp.</organization>
> Your text above says the ASCII-only identifier "is required", but Joe's
> top-level question is "how are these things rendered in the output
My understanding (from an usability/desirability point of view) would be
If the 'original' is in all-ASCII or all-Latin, then render just that.
If the 'original' contains non-Latin, then render that, followed by an
ASCII/Latin fallback in parentheses.
In the 'author's address' section, potentially not only list the Latin
fallback, but also the ASCII-only fallback.
If all/many items in a single location are in non-Latin/non-ASCII, then
group them. If it's only individual items, don't repeat unnecessarily.
So for your example above, it should be something like (suitably
している Corp. (Example Corp.)
Or if the actual Romanized name is indeed "Éxämple Corp.", then it
している Corp. (Éxämple Corp.)
For a case such as:
<author fullname="王伟" asciiFullname="English Name" asciiInitials="E."
it should be:
王伟 (E. Name)
and for this one:
<author initials="Ñ." asciiInitials="N." surname="One" fullname="Ño One"
it should be:
(and the ASCII should only appear internally in XML, or in the
ASCII-only version (if that's still part of the plan)).
It gets more complicated if we have somebody who's name is let's say
王伟, but who writes their name as Ño Öne in Latin (and No One in ASCII).
Such a situation may be rare, because people usually go all the way to
ASCII when they Romanize their name. Situations where I can imagine the
earliest needs are e.g. pinyin (the 'standard' way to write Chinese in
Latin script these days), which uses some diacritics, or somebody from
North Africa using Arabic script primarily and French accents on their
It may be that we don't have the necessary attributes to handle such a
situation. It may also be that we need some logic to check whether a
given string is Latin or not (NOT ASCII or not, which is trivial). But
probably not; we may be able to use the presence/absence of some type of
More information about the rfc-interest