[rfc-i] Proposed new RFC submission requirements

Joe Touch touch at isi.edu
Sat May 26 22:26:02 PDT 2012


On May 26, 2012, at 8:12 PM, Joe Hildebrand wrote:

> On 5/26/12 10:31 AM, "Joe Touch" <touch at isi.edu> wrote:
> 
>> Checked the example I posted. It doesn't use nl or ol anywhere. Here's how it
>> outputs lists (please ignore the font cruft - that can be removed/cleaned
>> easily):
> 
>> <h2><a name="_Toc257116045"></a>
> 
> Meh.  Could have done <h2 id="_Toc257116045">, which would have been
> cleaner.
> 
>> <a name="_Ref252706999"></a><a name="_Ref170784094"></a><a
> name="_Toc170705132">
> 
> What semantic are you trying to get across here? These are three ways you
> might want to jump to the same place?  Are you sure this isn't cruft that's
> just left in by the WYSIWYG tool?

Was there some part of "check the example I posted" above that was ambiguous? Of course that's the cruft - I even said that above.

>> <p class=MsoNormal>This document replaces TCP MD5 as follows [RFC2385]:</p>
> 
> What does MsoNormal mean?  Why no quotes on the class name?  Why no link on
> RFC2385?

No quotes because none are required. That's "Normal" in Microsoft Office (it prepends Mso). No link on the ref because that's not how it was generated.

Again, ignore the cruft. 

>> <p class=RFCListBullet>o<span style='font:7.0pt "Times New
> 
> Yow.  That's just awful.  I'd say that there's lots of bad code in the
> world.  I've written bad code in lots of different languages.  But manually
> injecting an "o" instead of using a perfectly valid construct like li?  This
> is only HTML in the technical sense, not a representative sample of anything
> a competent author would write by hand, or an adequate tool would generate
> on the author's behalf.

It's a representative sample of what MS Word generates from a List style type.

In case that's too subtle - again, that's *representative* of a very widely used commercial tool's interpretation of how to generate HTML.

>> Roman"'>&nbsp;&nbsp;&nbsp;
>> </span>TCP-AO uses a separate option Kind (TBD-IANA-KIND).</p>
>> 
>> <p class=RFCListBullet>o<span style='font:7.0pt "Times New
>> Roman"'>&nbsp;&nbsp;&nbsp;
>> </span>TCP-AO allows TCP MD5 to continue to be used concurrently for legacy
>> connections.</p>
>> 
>> There is no way to extract list container boundaries in post-processing.
> 
> Really?  It's ugly, but completely possible.  You look for a group of
> p.RFCListBullet next to one another.  Strip out the "o", switch p to li, and
> wrap a ul in front of it.
> 
> I would appreciate it if you would be a little more careful about statements
> like "impossible" and "no way".

It is impossible to differentiate between lists that could be adjacent in the HTML but are semantically separate.

>>> In a text-only format, this is exactly
>>> the sort of ambiguity that the doc doesn't have enough structure to answer.
>> 
>> Agreed. SO WHAT?
>> 
>> Why are you insisting on retaining that structure? What is the *current*
>> ***NECESSARY*** purpose?
> 
> Let's give yet another example.  Consider RFC 3454
> (http://www.ietf.org/rfc/rfc3454.txt).  Skip to the appendices.  See the
> tables?  There are several projects that parse the RFC text and generate
> source code to implement the stringprep protocol.
> 
> Another example, draft-ietf-codec-opus
> (http://tools.ietf.org/id/draft-ietf-codec-opus-14.txt) contains a big
> tarball of source code, which is actually normative.  The draft contains
> instructions for how to parse itself to extract the tarball.
> 
>>> In the face of a lack of data, I'd say that it's one list with four items.
>>> If it doesn't matter to the author enough to use a tool that preserves his
>>> or her intent, then nobody else is likely to care about the difference
>>> downstream.
>> 
>> I claim that this is true for all container information. The only such info
>> that might be useful downstream is for editors,
> 
> Those folks are information-extractors.  The Editor's job is to get the
> draft in order to be published as an RFC.
> 
>> and they can/should be able to
>> (re)generate valid container information themselves if needed, and leave that
>> requirement out of the submission stream.
> 
> Why make this more difficult for them than is needed, if it's roughly the
> same amount of work to come up with an adequate format in the first place?

Because it's not the same amount of work.

> One of the signs of good architecture ...

This isn't an academic exercise in good architecture. It's the design of a production publication system.

If you want to create a new publication architecture, by all means please do. We should consider it for RFCs after it's been widely used for around 10-15 years.

>>> If Word is generating <li> without a <ul> or <ol> around it, there's a bug.
>> 
>> It has worked just fine for over 25 years with whatever internal structure it
>> uses.
> 
> Yes.  You've done lots of great work.  Now it's time to get better.  It's
> time to grow, time to learn, time to look to start to appear relevant to the
> world outside the IETF that looks at our document series as laughably
> archaic - and therefore of suspect technical content.

We are trying to get better, but not to be a research project.

>> Word generates lists using <p> commands just fine. Anything with numbers gets
>> those numbers when the HTML is generated, not rendered - which ensures it uses
>> the numbers the author referred to in the text.
> 
> That's an interesting new requirement.
> 
> Serious question: do you refer to list item numbers frequently?  Would you
> mind pointing me to a document that does this, so I can reason about ways to
> approach this?

Same for Section numbers. You know, the ones that everyone wants to now use as a substitute for page numbers?

>> When Word inserts a BR it's because the author used a CTL-CR. That's because
>> the line break has meaning that the author wants to convey, which also means
>> that it should NOT be reflowed at the discretion of the viewer.
> 
> Again serious, can you point me to a document where this was important?  The
> only places I can think of at the moment all belong in pre elements.

Again, it's in the example I gave. It's used to generate breaks in the elements of lists to separate the list item from its description. It's not the only way to do it - HTML has about a dozen ways to do almost anything. And it's almost never true that only one way is "correct".

>>> You can intuit a container (and add it if need-be) if the sections are
>>> separated.
>> 
>> For heading containers, sure - it's just a stack. But not for lists, code,
>> ASCII figures, or anything else necessarily.
> 
> I'm pretty sure all of those are detectable by proximity, but regardless,
> those have nothing to do with containment, since they MUST be marked up
> appropriately in any adequately semantic input format.

Until you know you're sure, it's premature to assert that you know you can do it.

>>> No, I've given two.  Programmatic editing is one, and information extraction
>>> is the other.  The extraction function has nothing to do with editing, since
>>> it does not modify the file.
>> 
>> Editing involves copy/paste - it need not involve a single file.
> 
> That seems a little pedantic.  The information extractors are not editing
> the document for which we're talking about defining the structure.  It feels
> like you're grouping disparate things under one heading in order to make
> points, which doesn't seem like it's moving the conversation forward
> rapidly.

I've shown a number of ways in which containment adds requirements that aren't needed, and in some cases get in the way of copy/paste operations later. 

No, I don't think "information extraction" is the primary purpose of going to the new format. If it is, call me when the publishing community converges on a format for that.

Joe


More information about the rfc-interest mailing list