[rfc-i] [IAB Trac] #270: Resources associated with allowing UTF-8 in RFCs (Section 3.3)

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Fri Mar 8 02:01:36 PST 2013


Hello Heather,

[I'm sorry this is late (traveling), I saw that you already closed the 
issue, but I think this is important for the style guide.]

On 2013/03/05 4:18, Heather Flanagan (RFC Series Editor) wrote:
> On 2/27/13 12:51 AM, "Martin J. Dürst" wrote:

[snip]

>> 3) The distinction between ASCII and non-ASCII is a remainder from the
>> old format, not what's actually needed here. No decent publisher (e.g.
>> ACM, IEEE) or other standards organization (e.g. W3) would do things
>> such as:
>>    Patrick Fältström (Patrick Faltstrom)
>>    Martin Dürst (Martin Duerst)
>> On the other hand, most publishers wouldn't use non-Latin names in
>> prominent places at all (I very much think non-Latin names and addresses
>> should be allowed, but with Latin equivalents closeby. This would be
>> what W3C is doing for quite a while, see e.g.
>> http://www.w3.org/TR/2003/REC-SVG11-20030114/). So what we need for
>> author names and addresses is a distinction between Latin and non-Latin.
>
> We've discussed this somewhat over the last year, whether the discussion
> should be around Latin and non-Latin characters or whether this should
> be about the inclusion of additional characters from the UTF-8 character
> encoding.

This is not an either-or. Scripts such as Latin are related to encodings 
such as ASCII or UTF-8, but they are on a different level.

> My understanding is that the Latin-script alphabet is based
> on ASCII,

Please let's be careful with terminology. There is no single 
Latin-script alphabet. Different languages may use different alphabets, 
which may add or omit letters to the "basic Latin" alphabet (A-Z as we 
all know it).

Because ASCII can represent the letters of the basic Latin alphabet, it 
is possible to say that all Latin alphabets are based on ASCII.

> and non-Latin is broader than saying "UTF-8".

In theory, yes, because there are characters that are not (yet?) in 
Unicode (and can therefore not be represented in UTF-8). An example 
would be Klingon. But in practice, because Unicode coverage these days 
is extremely broad, this is irrelevant (if we ever get invaded by 
Klingons, I'm sure Unicode will react quickly, and the IETF doesn't have 
to worry :-).

> I think
> Latin/non-Latin is too broad a topic, whereas ASCII and UTF-8 more
> clearly bound what we should be able to handle at this point.

What is too broad a topic when I propose that we avoid:

     Patrick Fältström (Patrick Faltstrom)
     Martin Dürst (Martin Duerst)

and instead simply write:

     Patrick Fältström
     Martin Dürst

but on the other hand that we write:

     Yui Naruse (成瀬ゆい)
     Adil Allawi (عادل علاوي)

rather than

     成瀬ゆい (Yui Naruse)
     عادل علاوي (Adil Allawi)

or maybe even just:

     成瀬ゆい
     عادل علاوي

'ä', 'ö', and 'ü' (and some others) are Latin characters, but they can't 
be represented in ASCII. What I'm proposing is that we need to make a 
distinction, *among the characters that can be represented in UTF-8 but 
not in ASCII*, between Latin characters and non-Latin characters, in 
order to catch up to how every decent publisher has handled this or 
similar issues for way, way longer than we have RFCs.

I hope I have made my point clear enough. If anybody is still confused 
between encodings (ASCII, UTF-8,...) and scripts (Latin, Cyrillic, 
Arabic,...), I'll try again.

Regards,   Martin.


More information about the rfc-interest mailing list