[rfc-i] RFC editing tools

Ted Lemon mellon at fugue.com
Sun Dec 9 08:45:19 PST 2012

On Dec 9, 2012, at 2:18 AM, Joe Hildebrand (jhildebr) <jhildebr at cisco.com> wrote:
> It's not clear from the doc text, but as an author, I didn't add or
> maintain any of that linking myself.  The tooling is pretty trivial to fix
> all of that up.

Yes, of course, it's a simple matter of programming.   I am not saying that what you propose is impossible; merely arguing that it's not the best solution.

> I don't know why you would ever edit section numbering by hand, even in
> WYSIWYG mode.

It ought to be possible to edit the XML or HTML source in a text editor.   If section numbering is in the canonical form of the document, that's suddenly a whole lot harder.

> I agree that the current form blurs presentation and representation, and
> I'm open to other HTML representations.  However, this doesn't seem *that*
> complex a regular expression in practice:
> /^(Appendix [A-Z]+\.)?([\d\.]+)?\s+/

Okay, so what's the parsing/validation process?   Let's walk through it:

1. Validate the XML using W3C schema or similar
2. Parse the XML into a DOM.
3. Recursively descend the DOM, looking for nodes that require special case handling.
4. For each such node, look for a text sub-node.
5. Normalize the text of the sub-node (convert all whitespace chunks to single spaces, delete leading and trailing whitespace).
6. If there are multiple valid forms the text could take, determine which form the text has taken (e.g., Appendix versus Section)
7. Based on this determination, validate the text syntactically.
8. Turn the text node into an internal DOM node that contains the semantic information that was formerly represented as text
9. Add the faked-up DOM node to a table of similar nodes.

Now, once we've processed the entire tree, for each set of semantically similar textually-parsed nodes, validate the semantics that were parsed out of text nodes and hence couldn't be validated by W3C schema, to wit:

- Make sure that section numbers are sequential and that there are no gaps
- Make sure that appendix numbers are sequential
- Make sure that no appendixes appear before sections

Compare this to a pure XML doc with no semantics in any text nodes:

1. Validate the XML using W3C schema or similar
2. Parse the XML into a DOM.

Why is the XML doc parsing and validation process so much shorter?   Two reasons.   First, xml tags can be validated by W3C schema; div tags with special meaning given by class attributes can't.   Second, because it doesn't contain any generated information that would need to be checked—there are no section numbers, for instance.   Section numbers only appear in presentation docs, not in the canonical representation.

> I really didn't intend to define new HTML tags.  I thought that I had been
> pretty careful about picking tags that were both standardized and
> widely-implemented.  Could you please give me an example of what you're
> talking about so I can fix it?

You've said that you need additional standards docs to define things equivalent to the xml2rfc author tag.   Either you are defining new tags, or you are defining div tags with special semantics based on class attributes.   Again, these can't be validated by a schema.

More information about the rfc-interest mailing list