[rfc-i] [IAB Trac] #270: Resources associated with allowing UTF-8 in RFCs (Section 3.3)
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Fri Mar 8 02:01:36 PST 2013
[I'm sorry this is late (traveling), I saw that you already closed the
issue, but I think this is important for the style guide.]
On 2013/03/05 4:18, Heather Flanagan (RFC Series Editor) wrote:
> On 2/27/13 12:51 AM, "Martin J. Dürst" wrote:
>> 3) The distinction between ASCII and non-ASCII is a remainder from the
>> old format, not what's actually needed here. No decent publisher (e.g.
>> ACM, IEEE) or other standards organization (e.g. W3) would do things
>> such as:
>> Patrick Fältström (Patrick Faltstrom)
>> Martin Dürst (Martin Duerst)
>> On the other hand, most publishers wouldn't use non-Latin names in
>> prominent places at all (I very much think non-Latin names and addresses
>> should be allowed, but with Latin equivalents closeby. This would be
>> what W3C is doing for quite a while, see e.g.
>> http://www.w3.org/TR/2003/REC-SVG11-20030114/). So what we need for
>> author names and addresses is a distinction between Latin and non-Latin.
> We've discussed this somewhat over the last year, whether the discussion
> should be around Latin and non-Latin characters or whether this should
> be about the inclusion of additional characters from the UTF-8 character
This is not an either-or. Scripts such as Latin are related to encodings
such as ASCII or UTF-8, but they are on a different level.
> My understanding is that the Latin-script alphabet is based
> on ASCII,
Please let's be careful with terminology. There is no single
Latin-script alphabet. Different languages may use different alphabets,
which may add or omit letters to the "basic Latin" alphabet (A-Z as we
all know it).
Because ASCII can represent the letters of the basic Latin alphabet, it
is possible to say that all Latin alphabets are based on ASCII.
> and non-Latin is broader than saying "UTF-8".
In theory, yes, because there are characters that are not (yet?) in
Unicode (and can therefore not be represented in UTF-8). An example
would be Klingon. But in practice, because Unicode coverage these days
is extremely broad, this is irrelevant (if we ever get invaded by
Klingons, I'm sure Unicode will react quickly, and the IETF doesn't have
to worry :-).
> I think
> Latin/non-Latin is too broad a topic, whereas ASCII and UTF-8 more
> clearly bound what we should be able to handle at this point.
What is too broad a topic when I propose that we avoid:
Patrick Fältström (Patrick Faltstrom)
Martin Dürst (Martin Duerst)
and instead simply write:
but on the other hand that we write:
Yui Naruse (成瀬ゆい)
Adil Allawi (عادل علاوي)
成瀬ゆい (Yui Naruse)
عادل علاوي (Adil Allawi)
or maybe even just:
'ä', 'ö', and 'ü' (and some others) are Latin characters, but they can't
be represented in ASCII. What I'm proposing is that we need to make a
distinction, *among the characters that can be represented in UTF-8 but
not in ASCII*, between Latin characters and non-Latin characters, in
order to catch up to how every decent publisher has handled this or
similar issues for way, way longer than we have RFCs.
I hope I have made my point clear enough. If anybody is still confused
between encodings (ASCII, UTF-8,...) and scripts (Latin, Cyrillic,
Arabic,...), I'll try again.
More information about the rfc-interest