[rfc-i] draft-iab-rfc-nonascii-00

Heather Flanagan (RFC Series Editor) rse at rfc-editor.org
Tue Mar 1 09:58:32 PST 2016


On 2/29/16 11:01 AM, Julian Reschke wrote:
> On 2016-02-29 19:46, Heather Flanagan (RFC Series Editor) wrote:
>> ...
>>> "Searches against RFC indexes and database tables need to return
>>> expected results and support appropriate Unicode string matching
>>> behaviors;"
>>>
>>> It's not clear what that means, in particular unless we define expected
>>> results.
>>
>> People expect search engines to be able to perform searches such that
>> searching on "GEANT", for example, will return matches for both "GEANT"
>> and "GÉANT". The reverse would also be true. I expect this is
>> established enough behavior that we do not need to define it in more
>> detail (insert implied question here).
> 
> OK, but how exactly does that affect the vocabulary?

The XML vocabulary? I don't see why it would affect the vocabulary
beyond what we have already anticipated; we have the ascii attribute to
help clarify where necessary. Or do you mean something else?

> 
>> ...
>>> "For names that include characters outside of the Unicode Latin and
>>> Latin Extended script, an author-provided, ASCII-only identifier is
>>> required to assist in search and indexing of the document."
>>>
>>> It would be good to be more precise about what non-ASCII characters are
>>> allowed (range?).
>>>
>>> <http://greenbytes.de/tech/webdav/draft-iab-rfc-nonascii-00.html#rfc.section.3.4.p.12>:
>>>
>>>
>>
>> My understanding is that "Latin Extended" is a reasonable way to capture
>> Basic Latin (ASCII)
>> Latin-1 Supplement
>> Latin Extended-A
>> Latin Extended-B
>> Latin Extended-C
>> Latin Extended-D
>> Latin Extended-E
>> Latin Extended Additional
> 
> OK, so the code ranges as per <http://www.unicode.org/charts/>, we may
> want to include those over here.
> 
> (I also note that there's an "IPA Extensions" code page I'll have to
> look into...)
> 

Does the following change seem reasonable?

OLD:
Person names may appear in several places within an RFC. In both the
front page header and the references section, when a non-Latin script is
used, the fullname of the author is required. Initials are supported and
encouraged if available. In all cases, valid Unicode is required. For
names that include characters outside of the Unicode Latin and Latin
Extended script, an author-provided, ASCII-only identifier is required to
assist in improving general readability as well as the searchability and
indexing of the document.

PROPOSED:
Person names may appear in several places within an RFC. In both the
front page header and the references section, when a non-Latin script is
used, the fullname of the author is required. Initials are supported and
encouraged if available. In all cases, valid Unicode is required. For
names that include characters outside of the Unicode Latin and Latin
Extended script (Basic Latin (ASCII), Latin-1 Supplement, Latin Extended-A,
Latin Extended-B, Latin Extended-C, Latin Extended-D, Latin Extended-E,
Latin Extended Additional) an author-provided, ASCII-only identifier is
required to assist in improving general readability as well as the
searchability and indexing of the document <xref target="UNICODE-CHART"/>.

>>  ...
>>> "Keywords and citation tags must be ASCII only."
>>>
>>> What does "Keywords" refer to? The things we put into the xml2rfc
>>> <keyword> element?
>>
>> Yes.
> 
> Ok. Maybe state that, as the keywords currently are invisible in the
> specs, so people might not get what this is about...

OLD:
Keywords and citation tags must be ASCII only.

NEW:
Keywords, as tagged with the <keyword> element in XML, and citation tags
must be ASCII only.

-Heather


More information about the rfc-interest mailing list