[rfc-i] Input Syntax vs Canonical Form/rfcedstyle vs Output Formats [was: Re: Comments on draft-hoffman-xml2rfc-06]

Elwyn Davies elwynd at folly.org.uk
Fri May 2 08:28:35 PDT 2014

On Fri, 2014-05-02 at 07:57 -0500, Dave Crocker wrote:
> On 5/2/2014 1:29 AM, Julian Reschke wrote:
> > 
> > My feeling is that you are over-complicating things. If the input can be
> > canonicalized, what do we gain by rewriting it in the canonical form?
> > Concretely?
> I think my basic question is essentially the same, although reducing
> down to it followed a path that produced a few other comments:
>    1.  For reference, I think the document would actually be more clear
> by almost never using the term canonical, since it's mostly an issue
> that is out of scope for the document.  Rather, it should say something
> like "xml2rfc" and where needed for clarity "xml2rfc-v3".
> The document title gives a specific and, IMO, appropriate scope of
> discussion:  It is specifying an xml vocabulary.  Almost all discussion
> of the use of that vocabulary by the RFC Editor and the rest of the
> community -- that is, discussion of the larger context -- is appropriate
> to a different, "RFC Series systems-oriented" discussion.  It might be
> reasonable for the current document to make a quick reference to the use
> of xml2rfc within a larger context, but only a quick reference and in
> the Introduction.
> Specifications for system or service components, such as an object
> format like this, often confuse things by mixing discussion of the
> component with discussion of the system it lives within.  Focus on the
> vocabulary, since that's the primary task of the spec.  Leave the large
> discussion for a different document.

I don't have a problem with this.  The vocabulary itself does not need
to mention the canonical format.

A couple of the points I had were about removing pieces where the
canonical format was effectively mentioned in 'obsoletes' and 'updates'.

>    2.  The document's reference to 'formats' is really to
> 'representations', which is a meaningful difference.  Formatting is
> about layout.  Representation is really at the level of different
> language; html vs xml is not a matter of format, but of representation.
> The semantic difference in terms is more than mere quibbling, IMO.
> When referring to other representations the document should say say
> something like "other representations' or "non-xml2rfc representations"
> or the like.  But again, I'm not clear why /this/ document needs to make
> many or any such references.  In any event, within a document like this,
> saying 'canonical' as the reference for what is being defined is too
> abstract.

In these terms and in my view the canonical format is a layout of the
xml2rfc representation conforming to the RFC editor style and intended
to conform to certain layout rules defined by the community in
conjunction with the RFC Editor that will make the new canonical
representation of RFCs aesthetically pleasing, of a standard (i.e., all
with similar) 'professional' appearance and making it easy to
use/reference both for humans and automated processors.

The input format can be rather freer while still conforming to the v3
vocabulary and with a minimum of layout constraints.  
>    3.  To the extent that anything in the document is restricted to the
> 'canonical' version -- vs. the input version -- it needs to be
> distinguished explicitly, probably in a separate section, rather than
> just annotated.
> I'm not clearly seeing how an "input" version of this xml is different
> from a "canonical" version, nor why the differences are more interesting
> than what a typical auto-formatting function typically available in
> editors for things like xml or html. (In Oxygen, the button is labeled
> "format and indent".  In Dreamweaver, it's called "Apply Source
> Formatting".)

I see it as essentially this plus enforcing the RFC Editor style
constraints and probably adding some human readable commentary.
> This resolves in my brain as:
>      The canonical version of xml2rfc conforms to specific requirements
> in layout, such as line length and running sequences of spaces, and it
> contains specific required components.  The RFC Editor can accept
> versions of xml2rfc that deviate from the canonical version in the
> following ways:
>      a.  Maximum input line length: xxx
>      b.  Maximum running sequences of white space:  yyy
>      c:  Components that may be ommitted, and will be supplied by the
>          RFC Editor:  zzz, zzzz, zzzzz...
> The above list is, of course, merely meant as an exemplar for the kinds
> of things that might differ between 'input' and 'canonical'.  The
> document should state the differences explicitly.

I don't think any of this needs to be in the v3 vocabulary document.

It belongs either with the RFC Editor guidelines or the processor tool
> A good pretty-formatting engine can render structured data like xml
> quite readable by a simple text editor.  So the benefits of being able
> to 'input' in various styles of formatting (and even missing specific
> specific components that are added later) and running them through an
> engine that canonicalizes the document layout are significant, IMO.
> But there's quite a difference between 'canonical formatting' and
> anything else more substantial.
>    <<snip>>


More information about the rfc-interest mailing list