[rfc-i] [IAB Trac] #270: Resources associated with allowing UTF-8 in RFCs (Section 3.3)

Heather Flanagan (RFC Series Editor) rse at rfc-editor.org
Tue Mar 12 09:24:41 PDT 2013


On 3/8/13 5:01 AM, "Martin J. Dürst" wrote:
> Hello Heather,
>
> [I'm sorry this is late (traveling), I saw that you already closed the
> issue, but I think this is important for the style guide.]
I very much agree, this is very important to the Style Guide and will
not be lost as I work on that next.  The RFC format adn the Style Guide
are very closely linked, and as we change format, it must inform the
changes to the Style Guide.  What you listed below is an example of one
issue.  Another example will be the guidance required if/when we have
multiple Publication formats - each may require special information in
the Style Guide.

So, agreed, thank you for the input, it will be part of what feeds
revision of the Style Guide.

-Heather

>
> On 2013/03/05 4:18, Heather Flanagan (RFC Series Editor) wrote:
>> On 2/27/13 12:51 AM, "Martin J. Dürst" wrote:
>
> [snip]
>
>>> 3) The distinction between ASCII and non-ASCII is a remainder from the
>>> old format, not what's actually needed here. No decent publisher (e.g.
>>> ACM, IEEE) or other standards organization (e.g. W3) would do things
>>> such as:
>>>    Patrick Fältström (Patrick Faltstrom)
>>>    Martin Dürst (Martin Duerst)
>>> On the other hand, most publishers wouldn't use non-Latin names in
>>> prominent places at all (I very much think non-Latin names and
>>> addresses
>>> should be allowed, but with Latin equivalents closeby. This would be
>>> what W3C is doing for quite a while, see e.g.
>>> http://www.w3.org/TR/2003/REC-SVG11-20030114/). So what we need for
>>> author names and addresses is a distinction between Latin and
>>> non-Latin.
>>
>> We've discussed this somewhat over the last year, whether the discussion
>> should be around Latin and non-Latin characters or whether this should
>> be about the inclusion of additional characters from the UTF-8 character
>> encoding.
>
> This is not an either-or. Scripts such as Latin are related to
> encodings such as ASCII or UTF-8, but they are on a different level.
>
>> My understanding is that the Latin-script alphabet is based
>> on ASCII,
>
> Please let's be careful with terminology. There is no single
> Latin-script alphabet. Different languages may use different
> alphabets, which may add or omit letters to the "basic Latin" alphabet
> (A-Z as we all know it).
>
> Because ASCII can represent the letters of the basic Latin alphabet,
> it is possible to say that all Latin alphabets are based on ASCII.
>
>> and non-Latin is broader than saying "UTF-8".
>
> In theory, yes, because there are characters that are not (yet?) in
> Unicode (and can therefore not be represented in UTF-8). An example
> would be Klingon. But in practice, because Unicode coverage these days
> is extremely broad, this is irrelevant (if we ever get invaded by
> Klingons, I'm sure Unicode will react quickly, and the IETF doesn't
> have to worry :-).
>
>> I think
>> Latin/non-Latin is too broad a topic, whereas ASCII and UTF-8 more
>> clearly bound what we should be able to handle at this point.
>
> What is too broad a topic when I propose that we avoid:
>
>     Patrick Fältström (Patrick Faltstrom)
>     Martin Dürst (Martin Duerst)
>
> and instead simply write:
>
>     Patrick Fältström
>     Martin Dürst
>
> but on the other hand that we write:
>
>     Yui Naruse (成瀬ゆい)
>     Adil Allawi (عادل علاوي)
>
> rather than
>
>     成瀬ゆい (Yui Naruse)
>     عادل علاوي (Adil Allawi)
>
> or maybe even just:
>
>     成瀬ゆい
>     عادل علاوي
>
> 'ä', 'ö', and 'ü' (and some others) are Latin characters, but they
> can't be represented in ASCII. What I'm proposing is that we need to
> make a distinction, *among the characters that can be represented in
> UTF-8 but not in ASCII*, between Latin characters and non-Latin
> characters, in order to catch up to how every decent publisher has
> handled this or similar issues for way, way longer than we have RFCs.
>
> I hope I have made my point clear enough. If anybody is still confused
> between encodings (ASCII, UTF-8,...) and scripts (Latin, Cyrillic,
> Arabic,...), I'll try again.
>
> Regards,   Martin.
>



More information about the rfc-interest mailing list