[rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02, U+ syntax

Paul Hoffman paul.hoffman at vpnc.org
Wed Aug 31 12:25:45 PDT 2016


On 31 Aug 2016, at 10:02, Sean Leonard wrote:

> /(Sent this to the authors, and the suggestion was that this is the 
> right mailing list for public discussion.)/
>
> **********
> Hello draft-iab-rfc-nonascii-02 people, here is feedback on 
> draft-iab-rfc-nonascii-02.
>
> Section 3.4 of draft-iab-rfc-nonascii-02 provides no less than six 
> preferred alternatives for how to represent a single Unicode character 
> or code point. They all pretty much say “the ___ character (___)” 
> in various permutations. None of these are inherently wrong.
>
> However, The Unicode Standard itself (9.0.0 and prior versions) 
> provides a specific convention in Appendix A:
> “U+[x][x]xxxx NAME OF CHARACTER”
>
> Notably, the convention does not use “the ___ character” 
> formulation. Grammatically, the convention is a character, so an 
> article is omitted. A conforming example would be:
>
>  1.  Temperature changes in the Temperature Control Protocol are
>      indicated by U+2206 INCREMENT.
>
> I would like to propose that this be used as at least a priority 
> alternative.

Disagree. That formulation is harder to read in running text, and 
running text is exactly the formulation we are aiming for. The fact that 
TUC likes a particular format should not impinge on our choice for 
readability.

>
> In The Unicode Standard, two other conventions are noted:
>
> U+1F631 “😱” FACE SCREAMING IN FEAR
>
> U+1F631 “😱”
>
> These conventions show all-caps, and small-caps (which for PDF 
> presentation purposes, are actually stored as lowercase). They also 
> show curly quotes. I asked the Unicode mailing list over the weekend 
> and the general sense is that the uppercase is normative in plain text 
> (as shown in the UCD) but case distinctions, along with space and 
> (nearly all) hyphens, are not relevant for unambiguous identification.

Neither of these are easier to read in running text than the ones in the 
draft.

>
> draft-iab-rfc-nonascii-02 is only concerned with characters, not 
> semantics or presentation formats (unlike xml2rfc format). Assuming 
> that plain text is the norm for purposes of draft-iab-rfc-nonascii-02, 
> I suppose that it is sufficient for the plain text to have an ALL-CAPS 
> name. I was going to suggest a novel xml2rfc element for Unicode code 
> points, such as <ucode name="yes">😱</ucode> that would be 
> transformed into the output above in plain text mode. However, the 
> xml2rfc transformer can detect such text by looking for the presence 
> of “U+1F631 FACE SCREAMING IN FEAR”, and apply CSS to it in the 
> html output instead, viz.:
> span.uniname {                   /* CHAR STYLES */
> text-transform: lowercase;
> font-variant: small-caps;
> font-size: 110%;
> }
>
> As discussed here: 
> <http://www.unicode.org/mail-arch/unicode-ml/y2016-m08/0055.html>
>
> Personally I do not see the need for quotations around the character. 
> U+____ SP 😱 SP NAME ought to be good enough: the single 😱 is 
> going to be non-ASCII anyway. However there are implications for 
> combining marks, with or without quotes…this needs to be thought 
> through. Consider:
> U+0308 “◌̈” COMBINING DIAERESIS vs.
> U+0308 ◌̈ COMBINING DIAERESIS vs.
> U+0308 “̈” COMBINING DIAERESIS vs.
> U+0308 ̈ COMBINING DIAERESIS.
> See 
> <http://stackoverflow.com/questions/2224772/whats-the-unicode-glyph-used-to-indicate-combining-characters>
>
> The question is what happens when the 😱 is a specific protocol 
> element, which frequently (but not always) is quoted, such as "+" and 
> treated as verbatim text <spanx style="verb"> or the new <tt> in 
> xml2rfc v3.

This is another good reason for the current rules.

>
> Section 3.6 (and elsewhere) discusses “U+ notation” without a 
> reference. Appendix A of [UnicodeCurrent] is appropriate.

That seems fine.


More information about the rfc-interest mailing list