[rfc-i] Re: ABNF (RFC2234) vs HTTP's augmented BNF syntax (RFC822 + RFC2616)

Keith Moore moore at cs.utk.edu
Mon Feb 14 10:05:05 PST 2005

> > So the summary would be that it's ok to invoke the "implied LWS"
> > rule 
> Note that that is a rich source of potential interoperability
> problems.  For example, RFC 822 has such a rule, and an RFC 822
> Date field might look like:
>   Date: 1 Jan 2004 12 : 34 : 56 -0700
> or like
>   Date:1Jan2004 12:34:56-0700
> etc.

not clear.  RFC 822 describes three levels of interpretation for a
message header: 

1. breaking down the header into individual fields, consisting of
field names and field bodies
2. lexical analysis - breaking up a structured header field into tokens
3. parsing

In the lexical analysis phase you would normally recognize "1Jan2004" 
as an atom rather than recognizing "1" "Jan" and "2004" as separate
tokens because tokens are delimited by white-space and/or specials.
(See section 3.1.4.)

The problem with 822's syntax isn't that it uses the implied LWS rule,
it's that (a) it doesn't make the distinction between lexical analysis
and parsing sufficiently clear (the rules for parsing and the rules
for lexical analysis are intermixed) and (b) by using different tokens 
for dates (including dates in received fields) than those used in other 
parts of structured fields, it forces lexical analysis to be 
context-sensitive.  In other words, the way 822's grammar is written
the lexical analyzer has to know that it's expecting a date and 
recognize "1" as 1*2DIGIT rather than recognizing "1Jan2004" as atom.

Either that or you have to rewrite the grammar so that date-time is:

date-time	= 	[ wday "," ] date time

wday		=	atom 	; one of "Mon" .. "Sun"

date		=	mday month year

mday		= 	atom	; 1*2DIGIT

month		=	atom	; one of "Jan" .. "Dec"

year		=	atom	; 2DIGIT / 4DIGIT

time		=	hour ":" minute [ ":" second ] tzone

hour		= 	atom	; 1*2DIGIT 0..23

minute		=	atom	; 2DIGIT 00..59

second		=	atom	; 2DIGIT 00..60

tzone:		=	atom 	; timezone name
		/       (( "+" / "-" ) offset)

offset		= 	atom	; 4DIGIT in the form HHMM where
				; HH is from 00..12 and MM is from
				; 00..59

( even then, "+" is treated as a special when it appears in a
timezone and not otherwise)

and verify the individual tokens outside of the lexical analyzer.

There's nothing inherently wrong with the implied LWS rule.  Almost
every programming language uses a similar rule and manages to do so
without creating interoperability problems.  And 822's grammar is 
much cleaner and easier to understand than 2822's grammar.  


More information about the rfc-interest mailing list