[rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02, U+ syntax

Sean Leonard dev+ietf at seantek.com
Thu Sep 1 16:29:36 PDT 2016


On 9/1/2016 3:18 AM, Martin J. Dürst wrote:
> P.S.: While I'm at it, in the sentence:
>                               BCP 137, "ASCII Escaping of Unicode
>    Character" describes the pros and cons of different options for
>    identifying Unicode characters in an ASCII document BCP137 [BCP137].
> there's just a bit too many "BCP 137" for my (and I hope everybody 
> else's) taste. (Unless this is an error produced by the html tools 
> version.)

I agree with Martin's assessment.

Suggested rewrite:

    How the Unicode character, code point, and name or name
    alias are written in the body may
    depend on context and the specific character(s) in question.
    [BCP137] and Appendix A of
    [UnicodeCurrent] provide alternatives and suggestions.
    All reasonable variations are acceptable within an RFC.


Regards,

Sean

>
>
> On 2016/09/01 04:25, Paul Hoffman wrote:
>> On 31 Aug 2016, at 10:02, Sean Leonard wrote:
>>
>>> /(Sent this to the authors, and the suggestion was that this is the
>>> right mailing list for public discussion.)/
>>>
>>> **********
>>> Hello draft-iab-rfc-nonascii-02 people, here is feedback on
>>> draft-iab-rfc-nonascii-02.
>>>
>>> Section 3.4 of draft-iab-rfc-nonascii-02 provides no less than six
>>> preferred alternatives for how to represent a single Unicode character
>>> or code point. They all pretty much say “the ___ character (___)” in
>>> various permutations. None of these are inherently wrong.
>>>
>>> However, The Unicode Standard itself (9.0.0 and prior versions)
>>> provides a specific convention in Appendix A:
>>> “U+[x][x]xxxx NAME OF CHARACTER”
>>>
>>> Notably, the convention does not use “the ___ character” formulation.
>>> Grammatically, the convention is a character, so an article is
>>> omitted. A conforming example would be:
>>>
>>>  1.  Temperature changes in the Temperature Control Protocol are
>>>      indicated by U+2206 INCREMENT.
>>>
>>> I would like to propose that this be used as at least a priority
>>> alternative.
>>
>> Disagree. That formulation is harder to read in running text, and
>> running text is exactly the formulation we are aiming for. The fact that
>> TUC likes a particular format should not impinge on our choice for
>> readability.
>>
>>>
>>> In The Unicode Standard, two other conventions are noted:
>>>
>>> U+1F631 “😱” FACE SCREAMING IN FEAR
>>>
>>> U+1F631 “😱”
>>>
>>> These conventions show all-caps, and small-caps (which for PDF
>>> presentation purposes, are actually stored as lowercase). They also
>>> show curly quotes. I asked the Unicode mailing list over the weekend
>>> and the general sense is that the uppercase is normative in plain text
>>> (as shown in the UCD) but case distinctions, along with space and
>>> (nearly all) hyphens, are not relevant for unambiguous identification.
>>
>> Neither of these are easier to read in running text than the ones in the
>> draft.
>>
>>>
>>> draft-iab-rfc-nonascii-02 is only concerned with characters, not
>>> semantics or presentation formats (unlike xml2rfc format). Assuming
>>> that plain text is the norm for purposes of draft-iab-rfc-nonascii-02,
>>> I suppose that it is sufficient for the plain text to have an ALL-CAPS
>>> name. I was going to suggest a novel xml2rfc element for Unicode code
>>> points, such as <ucode name="yes">😱</ucode> that would be transformed
>>> into the output above in plain text mode. However, the xml2rfc
>>> transformer can detect such text by looking for the presence of
>>> “U+1F631 FACE SCREAMING IN FEAR”, and apply CSS to it in the html
>>> output instead, viz.:
>>> span.uniname {                   /* CHAR STYLES */
>>> text-transform: lowercase;
>>> font-variant: small-caps;
>>> font-size: 110%;
>>> }
>>>
>>> As discussed here:
>>> <http://www.unicode.org/mail-arch/unicode-ml/y2016-m08/0055.html>
>>>
>>> Personally I do not see the need for quotations around the character.
>>> U+____ SP 😱 SP NAME ought to be good enough: the single 😱 is going
>>> to be non-ASCII anyway. However there are implications for combining
>>> marks, with or without quotes…this needs to be thought through. 
>>> Consider:
>>> U+0308 “◌̈” COMBINING DIAERESIS vs.
>>> U+0308 ◌̈ COMBINING DIAERESIS vs.
>>> U+0308 “̈” COMBINING DIAERESIS vs.
>>> U+0308 ̈ COMBINING DIAERESIS.
>>> See
>>> <http://stackoverflow.com/questions/2224772/whats-the-unicode-glyph-used-to-indicate-combining-characters> 
>>>
>>>
>>>
>>> The question is what happens when the 😱 is a specific protocol
>>> element, which frequently (but not always) is quoted, such as "+" and
>>> treated as verbatim text <spanx style="verb"> or the new <tt> in
>>> xml2rfc v3.
>>
>> This is another good reason for the current rules.
>>
>>>
>>> Section 3.6 (and elsewhere) discusses “U+ notation” without a
>>> reference. Appendix A of [UnicodeCurrent] is appropriate.
>>
>> That seems fine.
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest at rfc-editor.org
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20160901/0b5d0c38/attachment.html>


More information about the rfc-interest mailing list