[rfc-i] RFC editing tools

Stefan Santesson stefan at aaa-sec.com
Tue Dec 11 07:45:47 PST 2012

I would strongly suggest support for Ted's arguments here.

I work a lot with XML and HTML but I have not tested every tool and
product on the market.
However, an XML schema can:

- Be compiled into native data object classes, e.g. to enable parsing an
XML file into java objects.
- As said by Ted, validate data against the stylesheet.
- Be transformed into virtually any presentation format using XML
- Be edited in semi WYSIWYG style if supported by schema and stylesheets,
using off-the-shelf XML editors.

HTML to my knowledge can't do this.

One more thing that we may want to consider if choosing an XML schema as
the source format.
Curent xml2rfc defines elements using compolex types with mixed content.
That is, using elements where you freely can mix text and subelements.
That is probably a good solution to make the XML Schema manual-edit
friendly, but it makes it a great deal harder to parse the content
At least with the tools I'm familiar with.

I imagine that it would be possible to convert an XML document according
to the xml2rfc schema to an XML schema that isn't using mixed content.
This might be a consideration for a source format where you could add info
to an xml2rfc doc to capture some of the data currently missing for
allowing transformation to all presentations formats, including back to
xml2rfc if necessary.


On 12/9/12 5:45 PM, "Ted Lemon" <mellon at fugue.com> wrote:

>On Dec 9, 2012, at 2:18 AM, Joe Hildebrand (jhildebr)
><jhildebr at cisco.com> wrote:
>> It's not clear from the doc text, but as an author, I didn't add or
>> maintain any of that linking myself.  The tooling is pretty trivial to
>> all of that up.
>Yes, of course, it's a simple matter of programming.   I am not saying
>that what you propose is impossible; merely arguing that it's not the
>best solution.
>> I don't know why you would ever edit section numbering by hand, even in
>> WYSIWYG mode.
>It ought to be possible to edit the XML or HTML source in a text editor.
> If section numbering is in the canonical form of the document, that's
>suddenly a whole lot harder.
>> I agree that the current form blurs presentation and representation, and
>> I'm open to other HTML representations.  However, this doesn't seem
>> complex a regular expression in practice:
>> /^(Appendix [A-Z]+\.)?([\d\.]+)?\s+/
>Okay, so what's the parsing/validation process?   Let's walk through it:
>1. Validate the XML using W3C schema or similar
>2. Parse the XML into a DOM.
>3. Recursively descend the DOM, looking for nodes that require special
>case handling.
>4. For each such node, look for a text sub-node.
>5. Normalize the text of the sub-node (convert all whitespace chunks to
>single spaces, delete leading and trailing whitespace).
>6. If there are multiple valid forms the text could take, determine which
>form the text has taken (e.g., Appendix versus Section)
>7. Based on this determination, validate the text syntactically.
>8. Turn the text node into an internal DOM node that contains the
>semantic information that was formerly represented as text
>9. Add the faked-up DOM node to a table of similar nodes.
>Now, once we've processed the entire tree, for each set of semantically
>similar textually-parsed nodes, validate the semantics that were parsed
>out of text nodes and hence couldn't be validated by W3C schema, to wit:
>- Make sure that section numbers are sequential and that there are no gaps
>- Make sure that appendix numbers are sequential
>- Make sure that no appendixes appear before sections
>Compare this to a pure XML doc with no semantics in any text nodes:
>1. Validate the XML using W3C schema or similar
>2. Parse the XML into a DOM.
>Why is the XML doc parsing and validation process so much shorter?   Two
>reasons.   First, xml tags can be validated by W3C schema; div tags with
>special meaning given by class attributes can't.   Second, because it
>doesn't contain any generated information that would need to be
>checked‹there are no section numbers, for instance.   Section numbers
>only appear in presentation docs, not in the canonical representation.
>> I really didn't intend to define new HTML tags.  I thought that I had
>> pretty careful about picking tags that were both standardized and
>> widely-implemented.  Could you please give me an example of what you're
>> talking about so I can fix it?
>You've said that you need additional standards docs to define things
>equivalent to the xml2rfc author tag.   Either you are defining new tags,
>or you are defining div tags with special semantics based on class
>attributes.   Again, these can't be validated by a schema.
>rfc-interest mailing list
>rfc-interest at rfc-editor.org

More information about the rfc-interest mailing list