[rfc-i] Input Syntax vs Canonical Form/rfcedstyle vs Output Formats [was: Re: Comments on draft-hoffman-xml2rfc-06]

Dave Crocker dhc at dcrocker.net
Sun May 4 08:25:49 PDT 2014

On 5/2/2014 8:28 AM, Elwyn Davies wrote:
> On Fri, 2014-05-02 at 07:57 -0500, Dave Crocker wrote:
>>    2.  The document's reference to 'formats' is really to
>> 'representations'
>> When referring to other representations the document should say say
>> something like "other representations' or "non-xml2rfc representations"
>> or the like. 
> In these terms and in my view the canonical format is a layout of the
> xml2rfc representation conforming to the RFC editor style and intended
> to conform to certain layout rules 

While I agree with the above, I think this exchange highlights the need
for some very careful terminology choices and very careful application
of them.  In particular, the formal choices should be of terms that
reduce likely misunderstanding.  From what I've seen, an example would
be to eliminate use of the word "format".  It appears to be used with
very different meanings.

There seem to be three things that are related but quite different and
need to be specified precisely:

   1.  Language

       A document is specified in a document representation language,
like html, xml2rfc, pdf, txt, etc.  Each of these is massively different
from the others and the word chosen for this category of description
should leave no doubt about the degree of difference.  Hence I suggest

   2.  Vocabulary

       This is a profile of commands, attributes, etc. that are used
within a language, to enable or restrict whatever is deemed appropriate
for RFCs.  Note that 'vocabulary' is already in use in this draft,
making the higher-level term 'language' rather natural...

   3.  Layout

       Conventions for the use of visually-relevant encodings of the
language.  I quite like Elwyn's use of the word 'layout' since the
meaning is intuitive; I doubt anyone will confuse what is meant with
either of the other two terms.

For the RFC Series, reference to a 'canonical version' of a document
entails specification of all three of the above.  The language will be
xml2rfc.  The vocabulary will be whatever is defined in this current
draft.  The layout will be a set of line and spacing convention.

The RFC Editor will accept versions of xml2rfc that differ from the
canonical conventions in terms of layout and possibly in terms of

So perhaps for this draft there really is a reasonable role for noting
what is canonical and what is not.

I think I suggest having sections that distinguish between "Canonical
vocabulary" versus "Additional Input Vocabulary", where the latter notes
how the listed vocabulary is translated into canonical form.  (It might
even be titled "xml2rfc v2 compatible vocabulary"...

There's a reasonable argument one can make, to have a single, unified
vocabulary list, with the two sets combined into a single set and with
an attribute like "deprecated" attached to vocabulary that is ok for
input but won't be in the canonical version.  However it doesn't help
folk easily see what's canonical and what's not.  Having distinct
sections will make this far clearer.

Also, the word "deprecated" isn't correct.  If it is still supported,
it's not deprecated.  Since it remains legal as input to the RFC editor,
it's still supported.  (Is flexible layout 'deprecated', given that it's
accepted as input but 'removed' in the canonical version?)

> The input format can be rather freer while still conforming to the v3
> vocabulary and with a minimum of layout constraints.  


>> This resolves in my brain as:
>>      a.  Maximum input line length: xxx
>>      b.  Maximum running sequences of white space:  yyy
>>      c:  Components that may be ommitted, and will be supplied by the
>>          RFC Editor:  zzz, zzzz, zzzzz...
>> The above list is, of course, merely meant as an exemplar for the kinds
>> of things that might differ between 'input' and 'canonical'.  The
>> document should state the differences explicitly.
> I don't think any of this needs to be in the v3 vocabulary document.
> It belongs either with the RFC Editor guidelines or the processor tool
> specifications.

I agree completely.

In separate documentation would be specification of input layout
flexibility vs. the more rigid canonical layout conventions.   Unless
I've missed something, that's compatible with what Elwyn is also saying.


Dave Crocker
Brandenburg InternetWorking

More information about the rfc-interest mailing list