[rfc-i] Proposed new RFC submission requirements

Joe Touch touch at isi.edu
Sat May 26 09:31:30 PDT 2012

On May 26, 2012, at 12:00 AM, Joe Hildebrand wrote:

> On 5/26/12 12:40 AM, "Joe Touch" <touch at isi.edu> wrote:
>> Here's the counterexample:
>> heading
>> para
>> para
>> para
>> list item
>> list item
>> list item
>> list item
>> Is that one list of four items? Is it two lists of two items each? Where is
>> the list container? Does the list belong to the paragraph that precedes it, or
>> as a separate container belonging to the heading level?
> I was only talking about sections.  There's no good way in HTML to do list
> items without an ol or ul around them, and I don't believe that Word is
> generating lists without wrappers.

Checked the example I posted. It doesn't use nl or ol anywhere. Here's how it outputs lists (please ignore the font cruft - that can be removed/cleaned easily):

<h2><a name="_Toc257116045"></a><a name="_Ref252706999"></a><a
name="_Ref170784094"></a><a name="_Toc170705132">3.2. Executive Summary</a></h2>

<p class=MsoNormal>This document replaces TCP MD5 as follows [RFC2385]:</p>

<p class=RFCListBullet>o<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;
</span>TCP-AO uses a separate option Kind (TBD-IANA-KIND).</p>

<p class=RFCListBullet>o<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;
</span>TCP-AO allows TCP MD5 to continue to be used concurrently for legacy connections.</p>

There is no way to extract list container boundaries in post-processing.

> In a text-only format, this is exactly
> the sort of ambiguity that the doc doesn't have enough structure to answer.

Agreed. SO WHAT?

Why are you insisting on retaining that structure? What is the *current* ***NECESSARY*** purpose?

> In the face of a lack of data, I'd say that it's one list with four items.
> If it doesn't matter to the author enough to use a tool that preserves his
> or her intent, then nobody else is likely to care about the difference
> downstream.

I claim that this is true for all container information. The only such info that might be useful downstream is for editors, and they can/should be able to (re)generate valid container information themselves if needed, and leave that requirement out of the submission stream.

I don't care if it's provided - it can be stripped out too. But it must not be required.

>>>> E.g., Word doesn't use that structure.
>>> You post-process the output of Word anyway.  Whoever writes the
>>> post-processing tool is going to have to write a few lines of code.
>> Some of it is easy - as you note, I can generate tags that contain sections
>> within the headings that delimit them.
>> I cannot generate section containers for groupings that cannot be indicated by
>> Word - as per the list above.
> If Word is generating <li> without a <ul> or <ol> around it, there's a bug.

It has worked just fine for over 25 years with whatever internal structure it uses.

Word generates lists using <p> commands just fine. Anything with numbers gets those numbers when the HTML is generated, not rendered - which ensures it uses the numbers the author referred to in the text.

>> Further, why group all the paragraphs under one heading? At least one output
>> from Word treats them as one long paragraph with BRs in between, rather than
>> as individual paragraphs.
> English text contains paragraphs.  In RFC's, we often group multiple
> paragraphs together into a section; the lineprinter format uses a blank line
> to delineate a paragraph boundary.
> Word knows how to deal with paragraphs.  It's inserting br's in order to
> gain control over line splitting, which is one of the things we're trying to
> solve for in the "reflowing" discussion.

When Word inserts a BR it's because the author used a CTL-CR. That's because the line break has meaning that the author wants to convey, which also means that it should NOT be reflowed at the discretion of the viewer.

>>> ...I assume the sections are separated by a header, which has a depth
>>> associated with it?  Everything between headers is in the same section.
>> But not necessarily the same container.
> You can intuit a container (and add it if need-be) if the sections are
> separated.

For heading containers, sure - it's just a stack. But not for lists, code, ASCII figures, or anything else necessarily.

>> You've only given the same reason repeatedly - editing. Support for editing
>> was not given for any formats except authoring, which we all seem to agree
>> ought to be up to authors.
> No, I've given two.  Programmatic editing is one, and information extraction
> is the other.  The extraction function has nothing to do with editing, since
> it does not modify the file.

Editing involves copy/paste - it need not involve a single file.


More information about the rfc-interest mailing list