[rfc-i] Re: ABNF (RFC2234) vs HTTP's augmented BNF syntax (RFC822
dhc2 at dcrocker.net
Tue Feb 15 12:53:28 PST 2005
On Tue, 15 Feb 2005 14:45:27 -0500, Keith Moore wrote:
> You're missing the point I was trying to make which is that
> 822's lexical analysis is context-sensitive. A lexical analyzer
> that has seen Date: or a ";" within a Received field needs to
> start scanning for 1*2DIGIT rather than atoms, white space,
> comments, etc.
I've come to characterize the issue with some different language:
RFC822 uses multiple lexical analyzers.
1. There is one for distinguishing between header fields.
2. There are a number of "classes" of header fields, according to the syntax of the value portion of the field. Each class requires a different lexical analyzer (and parser).
As for the reason this sort of thing isn't a problem for compiler writers, but is a significant one for email software developers, I believe it is simply that the issue is a bread-and-butter aspect of writing compilers, but that email folk are not all that experienced with the lex/parse model. I don't mean unaware of it; I mean it is not an automatic part of their development model.
When rfc733 was under development, I had not yet taken any CS courses. The other 3 authors were highly experienced, but not with compiler writing. It was frankly a fluke that I came across the issue, at the time, and thought it would be interesting to explore. Although I had fun doing the research and trying to move the formal spec to the lex/parse split, it was entirely an exercise by an amateur.
Given the actual skillsets of email developers, moving the specification to a one-level approach (eliminating the lexical analyzer) made a heck of a lot of sense. That the result is painful merely highlights why hierarchical divede-and-conquor is a good design approach for anything but the simplest grammars.
dcrocker a t ...
WE'VE MOVED to: www.bbiw.net
More information about the rfc-interest