[rfc-i] Feedback on draft-iab-rfc-nonascii-02, allowable characters
dev+ietf at seantek.com
Wed Aug 31 10:05:51 PDT 2016
/(Part 2: questions about what characters beyond ASCII are allowed)/
Hello draft-iab-rfc-nonascii-02 people, here is feedback
Then there is the issue of curly quotes, both in U+ syntax and in
general. Are curly quotes allowed? Should they be allowed in general in
non-ascii RFCs, or replaced for straight quotes? The xml2rfc tool
currently down-converts smart quotes to straight quotes in plain text,
but does not upconvert straight quotes to smart quotes in HTML. This has
implications for how “verbatim” (aka literal text strings) are notated
in the RFC formats.
What about marks that are currently allowed by xml2rfc, such as U+2014 —
EM DASH, that is converted to -- in plain text? I happen to use that
character aggressively as the prose calls for it, so it would be good to
know how it will show up in the plain text format, if at all.
What about other punctuation marks such as ⊗ ⊆ • † ¶ © § etc.? The whole
raft of Unicode space characters such as EM QUAD, EM SPACE, etc.? What
about characters that have strong mathematical value such as ×
MULTIPLICATION SIGN, ÷ DIVISION SIGN, and ∑ N-ARY SUMMATION, and the
whole block of mathematical operators? Such mathematical characters
might be especially useful for cryptographic specifications. And what
about block elements and geometric shapes (U+2500-U+25FF) in <artwork>?
Overall the implications of this draft are that uses that are not
explicitly mentioned (author names, protocol elements, addresses) are
discouraged or prohibited; therefore, characters like EM DASH and BULLET
that can be represented (however imperfectly) in ASCII ought to continue
to be used as such. Yet the text plainly states: “To support this move
away from ASCII, RFCs will switch to supporting UTF-8 as the default
character encoding and allow support for a broad range of Unicode
character support.” That supports the proposition that all code points
that are renderable in a modern, monospace, freely-available font (i.e.,
Courier New) are fair game, as well as code points that modern operating
systems are likely to render /or/ that would appear in author names
(emoji and CJK characters, Indic scripts, Arabic scripts). Note: Courier
New 5.13 (Windows 7) includes coverage for 2852 characters and 3254
glyphs; the version with Windows 10 supports even more, I think.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rfc-interest