[rfc-i] Re: ABNF (RFC2234) vs HTTP's augmented BNF syntax
(RFC822 + RFC2616)
moore at cs.utk.edu
Mon Feb 14 10:05:05 PST 2005
> > So the summary would be that it's ok to invoke the "implied LWS"
> > rule
> Note that that is a rich source of potential interoperability
> problems. For example, RFC 822 has such a rule, and an RFC 822
> Date field might look like:
> Date: 1 Jan 2004 12 : 34 : 56 -0700
> or like
> Date:1Jan2004 12:34:56-0700
not clear. RFC 822 describes three levels of interpretation for a
1. breaking down the header into individual fields, consisting of
field names and field bodies
2. lexical analysis - breaking up a structured header field into tokens
In the lexical analysis phase you would normally recognize "1Jan2004"
as an atom rather than recognizing "1" "Jan" and "2004" as separate
tokens because tokens are delimited by white-space and/or specials.
(See section 3.1.4.)
The problem with 822's syntax isn't that it uses the implied LWS rule,
it's that (a) it doesn't make the distinction between lexical analysis
and parsing sufficiently clear (the rules for parsing and the rules
for lexical analysis are intermixed) and (b) by using different tokens
for dates (including dates in received fields) than those used in other
parts of structured fields, it forces lexical analysis to be
context-sensitive. In other words, the way 822's grammar is written
the lexical analyzer has to know that it's expecting a date and
recognize "1" as 1*2DIGIT rather than recognizing "1Jan2004" as atom.
Either that or you have to rewrite the grammar so that date-time is:
date-time = [ wday "," ] date time
wday = atom ; one of "Mon" .. "Sun"
date = mday month year
mday = atom ; 1*2DIGIT
month = atom ; one of "Jan" .. "Dec"
year = atom ; 2DIGIT / 4DIGIT
time = hour ":" minute [ ":" second ] tzone
hour = atom ; 1*2DIGIT 0..23
minute = atom ; 2DIGIT 00..59
second = atom ; 2DIGIT 00..60
tzone: = atom ; timezone name
/ (( "+" / "-" ) offset)
offset = atom ; 4DIGIT in the form HHMM where
; HH is from 00..12 and MM is from
( even then, "+" is treated as a special when it appears in a
timezone and not otherwise)
and verify the individual tokens outside of the lexical analyzer.
There's nothing inherently wrong with the implied LWS rule. Almost
every programming language uses a similar rule and manages to do so
without creating interoperability problems. And 822's grammar is
much cleaner and easier to understand than 2822's grammar.
More information about the rfc-interest