[rfc-i] draft-iab-rfc-nonascii-00

Julian Reschke julian.reschke at gmx.de
Mon Feb 29 11:01:54 PST 2016


On 2016-02-29 19:46, Heather Flanagan (RFC Series Editor) wrote:
> ...
>> "Searches against RFC indexes and database tables need to return
>> expected results and support appropriate Unicode string matching
>> behaviors;"
>>
>> It's not clear what that means, in particular unless we define expected
>> results.
>
> People expect search engines to be able to perform searches such that
> searching on "GEANT", for example, will return matches for both "GEANT"
> and "GÉANT". The reverse would also be true. I expect this is
> established enough behavior that we do not need to define it in more
> detail (insert implied question here).

OK, but how exactly does that affect the vocabulary?

> ...
>> "For names that include characters outside of the Unicode Latin and
>> Latin Extended script, an author-provided, ASCII-only identifier is
>> required to assist in search and indexing of the document."
>>
>> It would be good to be more precise about what non-ASCII characters are
>> allowed (range?).
>>
>> <http://greenbytes.de/tech/webdav/draft-iab-rfc-nonascii-00.html#rfc.section.3.4.p.12>:
>>
>
> My understanding is that "Latin Extended" is a reasonable way to capture
> Basic Latin (ASCII)
> Latin-1 Supplement
> Latin Extended-A
> Latin Extended-B
> Latin Extended-C
> Latin Extended-D
> Latin Extended-E
> Latin Extended Additional

OK, so the code ranges as per <http://www.unicode.org/charts/>, we may 
want to include those over here.

(I also note that there's an "IPA Extensions" code page I'll have to 
look into...)

>  ...
>> "Keywords and citation tags must be ASCII only."
>>
>> What does "Keywords" refer to? The things we put into the xml2rfc
>> <keyword> element?
>
> Yes.

Ok. Maybe state that, as the keywords currently are invisible in the 
specs, so people might not get what this is about...

Best regards, Julian





More information about the rfc-interest mailing list