[rfc-i] Input Syntax vs Canonical Form/rfcedstyle vs Output Formats [was: Re: Comments on draft-hoffman-xml2rfc-06]

Paul Hoffman paul.hoffman at vpnc.org
Sun May 4 14:09:51 PDT 2014


On May 2, 2014, at 5:57 AM, Dave Crocker <dhc at dcrocker.net> wrote:

>   1.  For reference, I think the document would actually be more clear
> by almost never using the term canonical, since it's mostly an issue
> that is out of scope for the document.  Rather, it should say something
> like "xml2rfc" and where needed for clarity "xml2rfc-v3".
> 
> The document title gives a specific and, IMO, appropriate scope of
> discussion:  It is specifying an xml vocabulary.  Almost all discussion
> of the use of that vocabulary by the RFC Editor and the rest of the
> community -- that is, discussion of the larger context -- is appropriate
> to a different, "RFC Series systems-oriented" discussion.  It might be
> reasonable for the current document to make a quick reference to the use
> of xml2rfc within a larger context, but only a quick reference and in
> the Introduction.
> 
> Specifications for system or service components, such as an object
> format like this, often confuse things by mixing discussion of the
> component with discussion of the system it lives within.  Focus on the
> vocabulary, since that's the primary task of the spec.  Leave the large
> discussion for a different document.

You are correct that this document mixes the discussion of the v3 format with what will and will not allowed in the canonical representation of an RFC once the changeover happens. The latter can, indeed, be split out.

>   2.  The document's reference to 'formats' is really to
> 'representations', which is a meaningful difference.  Formatting is
> about layout.  Representation is really at the level of different
> language; html vs xml is not a matter of format, but of representation.
> The semantic difference in terms is more than mere quibbling, IMO.
> 
> When referring to other representations the document should say say
> something like "other representations' or "non-xml2rfc representations"
> or the like.  But again, I'm not clear why /this/ document needs to make
> many or any such references.  In any event, within a document like this,
> saying 'canonical' as the reference for what is being defined is too
> abstract.

Good catch. The next draft will use "representation" when talking about the files published.


>   3.  To the extent that anything in the document is restricted to the
> 'canonical' version -- vs. the input version -- it needs to be
> distinguished explicitly, probably in a separate section, rather than
> just annotated.
> 
> I'm not clearly seeing how an "input" version of this xml is different
> from a "canonical" version, nor why the differences are more interesting
> than what a typical auto-formatting function typically available in
> editors for things like xml or html. (In Oxygen, the button is labeled
> "format and indent".  In Dreamweaver, it's called "Apply Source
> Formatting".)
> 
> This resolves in my brain as:
> 
>     The canonical version of xml2rfc conforms to specific requirements
> in layout, such as line length and running sequences of spaces, and it
> contains specific required components.  The RFC Editor can accept
> versions of xml2rfc that deviate from the canonical version in the
> following ways:
> 
>     a.  Maximum input line length: xxx
> 
>     b.  Maximum running sequences of white space:  yyy
> 
>     c:  Components that may be ommitted, and will be supplied by the
>         RFC Editor:  zzz, zzzz, zzzzz...
> 
> 
> The above list is, of course, merely meant as an exemplar for the kinds
> of things that might differ between 'input' and 'canonical'.  The
> document should state the differences explicitly.
> 
> A good pretty-formatting engine can render structured data like xml
> quite readable by a simple text editor.  So the benefits of being able
> to 'input' in various styles of formatting (and even missing specific
> specific components that are added later) and running them through an
> engine that canonicalizes the document layout are significant, IMO.
> 
> But there's quite a difference between 'canonical formatting' and
> anything else more substantial.

See above. One place where the input is different than the canonical version that is published is that the canonical version will have some fields filled in by the formatter. Another is that the input checker will probably not be as strict as the output checker.

>   4.  Constructs that are deprecated need to be moved out of the main
> document and into their own section, probably an appendix.  The spec
> needs to be a clean statement of v3.  Listing deprecated items is
> distracting for the primary use of the document.  Think in terms of what
> a reader will need to see 10 years from now.  It's not deprecated stuff.

This is an artifact of some people (possibly including you?) insisting that the new tool must allow people to submit things in v2 format even after the v3 format is established. If it were not for that, I would have just ripped all of the deprecated elements and attributes out and written an appendix in prose.

As you can probably tell, each draft is generated with a lot of tooling. Julian Reschke has done an incredible amount of work on that tooling. As a result, when I add stuff to the RNG, I cannot easily forget to document it in the document. More importantly, all the child-parent relationships are documented automatically.

The additional tooling that would be required to pull all the deprecated stuff to its own section would be daunting. And, given that some loud voices in the community insisted that things deprecated in v3 must still be processable, I'm not sure that doing so is worth the effort.

>   5.  If there are actual language components that are prohibited from
> the input version -- that is, their use is restricted to the RFC Editor
> -- then that certainly needs to be called out explicitly in the document
> (and justified.)

Yes.

> And probably moved into its own section.

Why? The model we chose instead was that all of them are legal in input documents, but the format specification says that the RFC Editor will be making the final determination.

> ps.  Stylistic concern -- I came across "In fact, the author believes
> that some of the features of the v3 grammar cannot be  specified as a
> DTD." in Appendix B, and am wondering whether a specification like this,
> developed by a group, should have personal views of the document editor,
> especially when it is embedded without special notation?

Whoopsie, fixed.

--Paul Hoffman


More information about the rfc-interest mailing list