[rfc-i] Feedback on draft-iab-rfc-nonascii-02, allowable characters

Sean Leonard dev+ietf at seantek.com
Wed Aug 31 10:05:51 PDT 2016


/(Part 2: questions about what characters beyond ASCII are allowed)/

**********
Hello draft-iab-rfc-nonascii-02 people, here is feedback 
on draft-iab-rfc-nonascii-02.

Then there is the issue of curly quotes, both in U+ syntax and in 
general. Are curly quotes allowed? Should they be allowed in general in 
non-ascii RFCs, or replaced for straight quotes? The xml2rfc tool 
currently down-converts smart quotes to straight quotes in plain text, 
but does not upconvert straight quotes to smart quotes in HTML. This has 
implications for how “verbatim” (aka literal text strings) are notated 
in the RFC formats.

What about marks that are currently allowed by xml2rfc, such as U+2014 — 
EM DASH, that is converted to -- in plain text? I happen to use that 
character aggressively as the prose calls for it, so it would be good to 
know how it will show up in the plain text format, if at all.

What about other punctuation marks such as ⊗ ⊆ • † ¶ © § etc.? The whole 
raft of Unicode space characters such as EM QUAD, EM SPACE, etc.? What 
about characters that have strong mathematical value such as × 
MULTIPLICATION SIGN, ÷ DIVISION SIGN, and ∑ N-ARY SUMMATION, and the 
whole block of mathematical operators? Such mathematical characters 
might be especially useful for cryptographic specifications. And what 
about block elements and geometric shapes (U+2500-U+25FF) in <artwork>?

Overall the implications of this draft are that uses that are not 
explicitly mentioned (author names, protocol elements, addresses) are 
discouraged or prohibited; therefore, characters like EM DASH and BULLET 
that can be represented (however imperfectly) in ASCII ought to continue 
to be used as such. Yet the text plainly states: “To support this move 
away from ASCII, RFCs will switch to supporting UTF-8 as the default 
character encoding and allow support for a broad range of Unicode 
character support.” That supports the proposition that all code points 
that are renderable in a modern, monospace, freely-available font (i.e., 
Courier New) are fair game, as well as code points that modern operating 
systems are likely to render /or/ that would appear in author names 
(emoji and CJK characters, Indic scripts, Arabic scripts). Note: Courier 
New 5.13 (Windows 7) includes coverage for 2852 characters and 3254 
glyphs; the version with Windows 10 supports even more, I think.

Sean

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20160831/03136fc0/attachment-0001.html>


More information about the rfc-interest mailing list