[rfc-i] [IAB Trac] #270: Resources associated with allowing UTF-8 in RFCs (Section 3.3)

Heather Flanagan (RFC Series Editor) rse at rfc-editor.org
Mon Mar 4 11:18:08 PST 2013


On 2/27/13 12:51 AM, "Martin J. Dürst" wrote:
> Hello Heather, others,
> 
> On 2013/02/27 8:50, IAB issue tracker wrote:
>> #270: Resources associated with allowing UTF-8 in RFCs (Section 3.3)
>>
>>
>> Comment (by hlflanagan at gmail.com):
>>
>>   After some discussion, I have revised the ASCII/UTF-8 requirement in
>>   Section 3.2 to read:
>>
>>   The official language of the RFC Series is English.  ASCII is
>> required for
>>   all "normative" text, i.e., text that must be read to understand or
>>   implement the technology described in the RFC.  UTF-8/Unicode text
>> will be
>>   allowed for Author names and addresses and non-normative text within an
>>   RFC.  Author names and addresses will require an ASCII equivalent for
>>   indexing purposes.
> 
> I'm not sure that "Author names and addresses will require an ASCII
> equivalent for indexing purposes." is exactly the right wording. There
> are three issues:
> 
> 1) "for indexing purposes" does not say whether the equivalent will be
> part of the published text, or will only be part of the metadata (maybe
> hidden somewhere in some versions of the published document).

I am expecting it will be both in the text (in the Authors' Addresses
section) and in the metadata.

> 
> 2) "for indexing purposes" does not provide a justification. Sorting of
> Unicode text (for indexing) can be done in various ways these days.
> Search engines and the like take care of quite a bit of variability.

Yes, but I am also considering the indexing being done by mirrors and
individuals using basic grep, at which point it gets less trivial to get
the names correct for non-ASCII character searching.

> 
> 3) The distinction between ASCII and non-ASCII is a remainder from the
> old format, not what's actually needed here. No decent publisher (e.g.
> ACM, IEEE) or other standards organization (e.g. W3) would do things
> such as:
>   Patrick Fältström (Patrick Faltstrom)
>   Martin Dürst (Martin Duerst)
> On the other hand, most publishers wouldn't use non-Latin names in
> prominent places at all (I very much think non-Latin names and addresses
> should be allowed, but with Latin equivalents closeby. This would be
> what W3C is doing for quite a while, see e.g.
> http://www.w3.org/TR/2003/REC-SVG11-20030114/). So what we need for
> author names and addresses is a distinction between Latin and non-Latin.

We've discussed this somewhat over the last year, whether the discussion
should be around Latin and non-Latin characters or whether this should
be about the inclusion of additional characters from the UTF-8 character
encoding.  My understanding is that the Latin-script alphabet is based
on ASCII, and non-Latin is broader than saying "UTF-8".  I think
Latin/non-Latin is too broad a topic, whereas ASCII and UTF-8 more
clearly bound what we should be able to handle at this point.


-Heather


More information about the rfc-interest mailing list