[rfc-i] Unicode in ABNF (in RFC) draft-seantek-unicode-in-abnf-01.txt
Martin J. Dürst
duerst at it.aoyama.ac.jp
Mon Oct 3 02:53:55 PDT 2016
A few quick comments from a cursory reading:
First, I note that the choice you have made for representing Unicode
codepoints seems to be the same that we made for RFC 3987, which is the
one that I think RFC 5234 and its predecessors also implicitly suggest.
If you have seen some discrepancies, I would appreciate a pointer. You
may also want to reference some of the
In the Introduction, you mention security problems, but they are not
detailed (no specifics, no examples) there and neither in the Security
In contrast to ASCII, Unicode (in any of its encoding forms) essentially
introduces multiple levels at which protocols can be described: bytes,
[code units (in the case of UTF-16xx),] code points, grapheme
clusters,... I'm fine with limiting this document to the code point
level, which is clearly what we need now, but it would be good to say
somewhere at least that this document doesn't deal with other levels.
Starting sections/paragraphs with parentheticals (e.g. "(Consult Section
2.3 of [RFC5234] in relation to this paragraph.)") is far away from good
writing. At the minimum, put these parentheticals at the end of the
paragraphs, but even better would be to convert them to actual text (in
most cases still at the end of the paragraphs) and say explicitly what
the "relation" is. (RFC 7405 looks much better in this respect.)
In the appendix, there are a lot of mostrosities such as
"UVCHARBEYONDLATIN1". Why not change that to something a bit more
readable, at the minimum something like UV_CHAR_BEYOND_LATIN_1 or so?
I don't see the point of defining aliases for C1 controls; it should be
difficult to use these explicitly, not easy.
For some of the aliases, a property-based approach seems to be the right
thing to do, although this may be difficult to align with the ABNF
The draft says:
Formally, this document updates [RFC5234] but does not modify it in
situ. Authors need to reference this document if they want to include
these enhancements; bare references to [RFC5234] do not include this
specification (or, for that matter, [RFC7405]).
There's no text whatsoever in RFC 7405 that would say that it doesn't
update RFC 5234 directly. But I may be missing something. Please clarify.
I don't see the need to use %su for Unicode strings. The code points
speak for themselves, just use %s. Leaving %i/%iu undefined for Unicode
is indeed advisable, although it could be based on default case folding,
but we know that this would be imperfect, in particular for Turkish.
Section 6 uses an example with actual Unicode characters. I'd definitely
wait for the new way of publishing drafts/RFCs before the final
publication of this document, so that this example (and hopefully a few
more) can use actual Unicode characters.
(I'd also change 'notated' to 'annotated'. (several occurrences))
That's about it, hope it helps.
On 2016/10/03 15:28, Sean Leonard wrote:
> Dear ABNF-Discuss (and rfc-interest):
> This draft by Chris Newman and I addresses an interesting topic: how to
> do Unicode in ABNF. Unicode has showed up in several different ways in
> protocols that are described in ABNF. These ways are not consistent
> across the RFC series, but now that Unicode is a pretty stable standard
> (for its basic parts) and now that UTF-8 RFCs are becoming a reality per
> draft-iab-rfc-nonascii-02, it is a good time to look at this issue. This
> is a fork from draft-seantek-abnf-more-core-rules.
> This draft is currently proposed as Experimental. Special thanks to Paul
> Kyzivat for discussing the matters in this draft, although he is not
> formally a co-author.
> The draft tries to be very conservative in its approach. Please read the
> draft for details. Some stuff was intentionally omitted as out-of-scope
> or too complicated for a general-purpose ABNF syntax parser, whether
> humans or machines.
> Comments and feedback are appreciated.
> A new version of I-D, draft-seantek-unicode-in-abnf-01.txt
> has been successfully submitted by Sean Leonard and posted to the
> IETF repository.
> Name: draft-seantek-unicode-in-abnf
> Revision: 01
> Title: Unicode in ABNF
> Document date: 2016-10-01
> Group: Individual Submission
> Pages: 11
> This experimental document adds support for Unicode strings in ABNF
> (Augmented Backus-Naur Form), and provides certain symbols related to
> Unicode code point ranges.
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
Martin J. Dürst
Department of Intelligent Information Technology
Collegue of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
More information about the rfc-interest