[rfc-i] draft-iab-rfc-nonascii-00

Julian Reschke julian.reschke at gmx.de
Tue Mar 1 10:10:40 PST 2016


On 2016-03-01 18:58, Heather Flanagan (RFC Series Editor) wrote:
>>> People expect search engines to be able to perform searches such that
>>> searching on "GEANT", for example, will return matches for both "GEANT"
>>> and "GÉANT". The reverse would also be true. I expect this is
>>> established enough behavior that we do not need to define it in more
>>> detail (insert implied question here).
>>
>> OK, but how exactly does that affect the vocabulary?
>
> The XML vocabulary? I don't see why it would affect the vocabulary
> beyond what we have already anticipated; we have the ascii attribute to
> help clarify where necessary. Or do you mean something else?

I don't think the requirement above actually justifies the complexity we 
added to the vocabulary. Search engines and databases have been able to 
deal with these things without having an explicit ASCII alternative.

So what I'm trying to understand is who's the audience for this 
requirement? If if was removed, what effect would that have?

>>> ...
>>>> "For names that include characters outside of the Unicode Latin and
>>>> Latin Extended script, an author-provided, ASCII-only identifier is
>>>> required to assist in search and indexing of the document."
>>>>
>>>> It would be good to be more precise about what non-ASCII characters are
>>>> allowed (range?).
>>>>
>>>> <http://greenbytes.de/tech/webdav/draft-iab-rfc-nonascii-00.html#rfc.section.3.4.p.12>:
>>>>
>>>>
>>>
>>> My understanding is that "Latin Extended" is a reasonable way to capture
>>> Basic Latin (ASCII)
>>> Latin-1 Supplement
>>> Latin Extended-A
>>> Latin Extended-B
>>> Latin Extended-C
>>> Latin Extended-D
>>> Latin Extended-E
>>> Latin Extended Additional
>>
>> OK, so the code ranges as per <http://www.unicode.org/charts/>, we may
>> want to include those over here.
>>
>> (I also note that there's an "IPA Extensions" code page I'll have to
>> look into...)
>>
>
> Does the following change seem reasonable?
>
> OLD:
> Person names may appear in several places within an RFC. In both the
> front page header and the references section, when a non-Latin script is
> used, the fullname of the author is required. Initials are supported and
> encouraged if available. In all cases, valid Unicode is required. For
> names that include characters outside of the Unicode Latin and Latin
> Extended script, an author-provided, ASCII-only identifier is required to
> assist in improving general readability as well as the searchability and
> indexing of the document.
>
> PROPOSED:
> Person names may appear in several places within an RFC. In both the
> front page header and the references section, when a non-Latin script is
> used, the fullname of the author is required. Initials are supported and
> encouraged if available. In all cases, valid Unicode is required. For
> names that include characters outside of the Unicode Latin and Latin
> Extended script (Basic Latin (ASCII), Latin-1 Supplement, Latin Extended-A,
> Latin Extended-B, Latin Extended-C, Latin Extended-D, Latin Extended-E,
> Latin Extended Additional) an author-provided, ASCII-only identifier is
> required to assist in improving general readability as well as the
> searchability and indexing of the document <xref target="UNICODE-CHART"/>.

I'd move the <xref> closer to the text it refers to, so at the end of 
the script enumeration.

This is better, but I think it would be even better to have a table that 
people can look at to see what *exact* character ranges are covered.

(I can make a proposal if you like)

>>>   ...
>>>> "Keywords and citation tags must be ASCII only."
>>>>
>>>> What does "Keywords" refer to? The things we put into the xml2rfc
>>>> <keyword> element?
>>>
>>> Yes.
>>
>> Ok. Maybe state that, as the keywords currently are invisible in the
>> specs, so people might not get what this is about...
>
> OLD:
> Keywords and citation tags must be ASCII only.
>
> NEW:
> Keywords, as tagged with the <keyword> element in XML, and citation tags
> must be ASCII only.

OK, maybe

"Keywords (as tagged with the <keyword> element in XML), and citation 
tags (as defined in the anchor attributes of <reference> elements) must 
be ASCII only."

Best regards, Julian


More information about the rfc-interest mailing list