[rfc-i] Input Syntax vs Canonical Form/rfcedstyle vs Output Formats [was: Re: Comments on draft-hoffman-xml2rfc-06]

Dave Crocker dhc at dcrocker.net
Fri May 2 05:57:06 PDT 2014

On 5/2/2014 1:29 AM, Julian Reschke wrote:
> My feeling is that you are over-complicating things. If the input can be
> canonicalized, what do we gain by rewriting it in the canonical form?
> Concretely?

I think my basic question is essentially the same, although reducing
down to it followed a path that produced a few other comments:

   1.  For reference, I think the document would actually be more clear
by almost never using the term canonical, since it's mostly an issue
that is out of scope for the document.  Rather, it should say something
like "xml2rfc" and where needed for clarity "xml2rfc-v3".

The document title gives a specific and, IMO, appropriate scope of
discussion:  It is specifying an xml vocabulary.  Almost all discussion
of the use of that vocabulary by the RFC Editor and the rest of the
community -- that is, discussion of the larger context -- is appropriate
to a different, "RFC Series systems-oriented" discussion.  It might be
reasonable for the current document to make a quick reference to the use
of xml2rfc within a larger context, but only a quick reference and in
the Introduction.

Specifications for system or service components, such as an object
format like this, often confuse things by mixing discussion of the
component with discussion of the system it lives within.  Focus on the
vocabulary, since that's the primary task of the spec.  Leave the large
discussion for a different document.

   2.  The document's reference to 'formats' is really to
'representations', which is a meaningful difference.  Formatting is
about layout.  Representation is really at the level of different
language; html vs xml is not a matter of format, but of representation.
The semantic difference in terms is more than mere quibbling, IMO.

When referring to other representations the document should say say
something like "other representations' or "non-xml2rfc representations"
or the like.  But again, I'm not clear why /this/ document needs to make
many or any such references.  In any event, within a document like this,
saying 'canonical' as the reference for what is being defined is too

   3.  To the extent that anything in the document is restricted to the
'canonical' version -- vs. the input version -- it needs to be
distinguished explicitly, probably in a separate section, rather than
just annotated.

I'm not clearly seeing how an "input" version of this xml is different
from a "canonical" version, nor why the differences are more interesting
than what a typical auto-formatting function typically available in
editors for things like xml or html. (In Oxygen, the button is labeled
"format and indent".  In Dreamweaver, it's called "Apply Source

This resolves in my brain as:

     The canonical version of xml2rfc conforms to specific requirements
in layout, such as line length and running sequences of spaces, and it
contains specific required components.  The RFC Editor can accept
versions of xml2rfc that deviate from the canonical version in the
following ways:

     a.  Maximum input line length: xxx

     b.  Maximum running sequences of white space:  yyy

     c:  Components that may be ommitted, and will be supplied by the
         RFC Editor:  zzz, zzzz, zzzzz...

The above list is, of course, merely meant as an exemplar for the kinds
of things that might differ between 'input' and 'canonical'.  The
document should state the differences explicitly.

A good pretty-formatting engine can render structured data like xml
quite readable by a simple text editor.  So the benefits of being able
to 'input' in various styles of formatting (and even missing specific
specific components that are added later) and running them through an
engine that canonicalizes the document layout are significant, IMO.

But there's quite a difference between 'canonical formatting' and
anything else more substantial.

   4.  Constructs that are deprecated need to be moved out of the main
document and into their own section, probably an appendix.  The spec
needs to be a clean statement of v3.  Listing deprecated items is
distracting for the primary use of the document.  Think in terms of what
a reader will need to see 10 years from now.  It's not deprecated stuff.

   5.  If there are actual language components that are prohibited from
the input version -- that is, their use is restricted to the RFC Editor
-- then that certainly needs to be called out explicitly in the document
(and justified.) And probably moved into its own section.


ps.  Stylistic concern -- I came across "In fact, the author believes
that some of the features of the v3 grammar cannot be  specified as a
DTD." in Appendix B, and am wondering whether a specification like this,
developed by a group, should have personal views of the document editor,
especially when it is embedded without special notation?

pps.  The long list of deprecated features is distressing.  Given the
earlier discussions, it never occurred to me that there would be so
little concern for staying compatible with the installed base.  But
then, the IETF seems to have largely lost its appreciation for the role
of operational stability...

Dave Crocker
Brandenburg InternetWorking

More information about the rfc-interest mailing list