[rfc-i] ABNF (RFC2234)

Bruce Lilly blilly at erols.com
Wed Feb 23 12:09:28 PST 2005

>  Date: 2005-02-22 14:27
>  From: Bill Fenner <fenner at research.att.com>

> [...] adding c-wsp makes the 
> grammar ambiguous so I'm not a fan.

For the record, I'm also opposed to multi-line "prose" (I'm
not thrilled about its provision at all; I certainly don't
want to encourage its use by lazy authors as a way of shirking
their responsibility to provide a clear and unambiguous

The grammar is already ambiguous because of *c-wsp issues.
For example:

        rulelist       =  1*( rule / (*c-wsp c-nl) )

        rule           =  rulename defined-as elements c-nl
                               ; continues if next line starts
                               ;  with white space

        elements       =  alternation *c-wsp

        c-wsp          =  WSP / (c-nl WSP)

        c-nl           =  comment / CRLF
                               ; comment or newline

        comment        =  ";" *(WSP / VCHAR) CRLF

An example ABNF snippet for discussion of the above:

a = b ; comment continues...
  ; ... on this line, due to the following continuation line
  ; this is NOT a continuation, even if it looks like one
  ; neither this line nor the one above it are associated with any rule

; comment starting at beginning of line, independent of any rule

  ; another comment line beginning with whitespace, independent

The "(*c-wsp c-nl)" alternative for the "rulelist" production
permits comments beginning with whitespace which are unrelated
to any rule.  Alternatively, the "*c-wsp" trailing context in
"elements" includes both "WSP" and "(c-nl WSP)", which permits a
continuation line containing only (leading whitespace and) a
comment -- if and only if that comment is also followed by a
continuation line (the WSP following the c-nl).  While it might
look like the comments in the ABNF ABNF following the "rulelist"
and "c-nl" definition lines (and a few others not excerpted above)
are related to those rules, in fact per the ABNF they are not
(unless the following line begins with whitespace, which we can't
readily discern because of indentation).  It appears that the
intent (Dave?) was that those comments be associated with the
rules immediately preceding them, but that's not what the ABNF
rules themselves denote [and I note in passing that one of the
RFC 2026 requirements for advancement to Draft status is that the
specification "must be well-understood"; OK -- show of hands --
who realized the implications w.r.t. continuation lines and
comments before this discussion? (for the record, my hands are
down, though it does explain some frustrating messages from an
attempt at LALR(1) parser implementation; the parser was doing
the right thing)].

There are other issues related to c-wsp placement in the ABNF ABNF;
I plan to address some of those separately.  However, it appears
that there are definitely issues related to comments and
continuation with the ABNF as specified. I recommend (at least the
following) two changes to correct the apparent errors:

1. require comments unassociated with any rule to begin at the
   start of a line (i.e. eliminate leading "*c-wsp" in the
   comment-or-empty-line alternative in the rulelist production)

2. permit comment lines associated with a preceding rule line to
   end the rule (i.e. eliminate the mandatory trailing WSP after
   c-nl in the c-wsp production).

I.e. the relevant lines would be rewritten as:

        rulelist       =  1*( rule / c-nl )


        c-wsp          =  WSP / c-nl

which would bring that part of the ABNF into alignment with what
seem to have been the intent, removes some ambiguities, and has
the nice side-effect of making the ABNF simpler and cleaner.

More information about the rfc-interest mailing list