[rfc-i] Proposed new RFC submission requirements

Joe Hildebrand jhildebr at cisco.com
Sat May 26 20:12:00 PDT 2012


On 5/26/12 10:31 AM, "Joe Touch" <touch at isi.edu> wrote:

> Checked the example I posted. It doesn't use nl or ol anywhere. Here's how it
> outputs lists (please ignore the font cruft - that can be removed/cleaned
> easily):

> <h2><a name="_Toc257116045"></a>

Meh.  Could have done <h2 id="_Toc257116045">, which would have been
cleaner.

> <a name="_Ref252706999"></a><a name="_Ref170784094"></a><a
name="_Toc170705132">

What semantic are you trying to get across here?  These are three ways you
might want to jump to the same place?  Are you sure this isn't cruft that's
just left in by the WYSIWYG tool?

> <p class=MsoNormal>This document replaces TCP MD5 as follows [RFC2385]:</p>

What does MsoNormal mean?  Why no quotes on the class name?  Why no link on
RFC2385?

> <p class=RFCListBullet>o<span style='font:7.0pt "Times New

Yow.  That's just awful.  I'd say that there's lots of bad code in the
world.  I've written bad code in lots of different languages.  But manually
injecting an "o" instead of using a perfectly valid construct like li?  This
is only HTML in the technical sense, not a representative sample of anything
a competent author would write by hand, or an adequate tool would generate
on the author's behalf.

> Roman"'>&nbsp;&nbsp;&nbsp;
> </span>TCP-AO uses a separate option Kind (TBD-IANA-KIND).</p>
> 
> <p class=RFCListBullet>o<span style='font:7.0pt "Times New
> Roman"'>&nbsp;&nbsp;&nbsp;
> </span>TCP-AO allows TCP MD5 to continue to be used concurrently for legacy
> connections.</p>
> 
> There is no way to extract list container boundaries in post-processing.

Really?  It's ugly, but completely possible.  You look for a group of
p.RFCListBullet next to one another.  Strip out the "o", switch p to li, and
wrap a ul in front of it.

I would appreciate it if you would be a little more careful about statements
like "impossible" and "no way".

>> In a text-only format, this is exactly
>> the sort of ambiguity that the doc doesn't have enough structure to answer.
> 
> Agreed. SO WHAT?
> 
> Why are you insisting on retaining that structure? What is the *current*
> ***NECESSARY*** purpose?

Let's give yet another example.  Consider RFC 3454
(http://www.ietf.org/rfc/rfc3454.txt).  Skip to the appendices.  See the
tables?  There are several projects that parse the RFC text and generate
source code to implement the stringprep protocol.

Another example, draft-ietf-codec-opus
(http://tools.ietf.org/id/draft-ietf-codec-opus-14.txt) contains a big
tarball of source code, which is actually normative.  The draft contains
instructions for how to parse itself to extract the tarball.

>> In the face of a lack of data, I'd say that it's one list with four items.
>> If it doesn't matter to the author enough to use a tool that preserves his
>> or her intent, then nobody else is likely to care about the difference
>> downstream.
> 
> I claim that this is true for all container information. The only such info
> that might be useful downstream is for editors,

Those folks are information-extractors.  The Editor's job is to get the
draft in order to be published as an RFC.

> and they can/should be able to
> (re)generate valid container information themselves if needed, and leave that
> requirement out of the submission stream.

Why make this more difficult for them than is needed, if it's roughly the
same amount of work to come up with an adequate format in the first place?
One of the signs of good architecture is that a system can be used in ways
that the original designers hadn't considered.  Trying to prevent those
downstream uses because Word generates truly horrifying HTML seems quite
short-sighted.

>> If Word is generating <li> without a <ul> or <ol> around it, there's a bug.
> 
> It has worked just fine for over 25 years with whatever internal structure it
> uses.

Yes.  You've done lots of great work.  Now it's time to get better.  It's
time to grow, time to learn, time to look to start to appear relevant to the
world outside the IETF that looks at our document series as laughably
archaic - and therefore of suspect technical content.

> Word generates lists using <p> commands just fine. Anything with numbers gets
> those numbers when the HTML is generated, not rendered - which ensures it uses
> the numbers the author referred to in the text.

That's an interesting new requirement.

Serious question: do you refer to list item numbers frequently?  Would you
mind pointing me to a document that does this, so I can reason about ways to
approach this?

> When Word inserts a BR it's because the author used a CTL-CR. That's because
> the line break has meaning that the author wants to convey, which also means
> that it should NOT be reflowed at the discretion of the viewer.

Again serious, can you point me to a document where this was important?  The
only places I can think of at the moment all belong in pre elements.

>> You can intuit a container (and add it if need-be) if the sections are
>> separated.
> 
> For heading containers, sure - it's just a stack. But not for lists, code,
> ASCII figures, or anything else necessarily.

I'm pretty sure all of those are detectable by proximity, but regardless,
those have nothing to do with containment, since they MUST be marked up
appropriately in any adequately semantic input format.

>> No, I've given two.  Programmatic editing is one, and information extraction
>> is the other.  The extraction function has nothing to do with editing, since
>> it does not modify the file.
> 
> Editing involves copy/paste - it need not involve a single file.

That seems a little pedantic.  The information extractors are not editing
the document for which we're talking about defining the structure.  It feels
like you're grouping disparate things under one heading in order to make
points, which doesn't seem like it's moving the conversation forward
rapidly.

-- 
Joe Hildebrand



More information about the rfc-interest mailing list