[rfc-i] Proposed new RFC submission requirements
touch at isi.edu
Sat May 26 22:26:02 PDT 2012
On May 26, 2012, at 8:12 PM, Joe Hildebrand wrote:
> On 5/26/12 10:31 AM, "Joe Touch" <touch at isi.edu> wrote:
>> Checked the example I posted. It doesn't use nl or ol anywhere. Here's how it
>> outputs lists (please ignore the font cruft - that can be removed/cleaned
>> <h2><a name="_Toc257116045"></a>
> Meh. Could have done <h2 id="_Toc257116045">, which would have been
>> <a name="_Ref252706999"></a><a name="_Ref170784094"></a><a
> What semantic are you trying to get across here? These are three ways you
> might want to jump to the same place? Are you sure this isn't cruft that's
> just left in by the WYSIWYG tool?
Was there some part of "check the example I posted" above that was ambiguous? Of course that's the cruft - I even said that above.
>> <p class=MsoNormal>This document replaces TCP MD5 as follows [RFC2385]:</p>
> What does MsoNormal mean? Why no quotes on the class name? Why no link on
No quotes because none are required. That's "Normal" in Microsoft Office (it prepends Mso). No link on the ref because that's not how it was generated.
Again, ignore the cruft.
>> <p class=RFCListBullet>o<span style='font:7.0pt "Times New
> Yow. That's just awful. I'd say that there's lots of bad code in the
> world. I've written bad code in lots of different languages. But manually
> injecting an "o" instead of using a perfectly valid construct like li? This
> is only HTML in the technical sense, not a representative sample of anything
> a competent author would write by hand, or an adequate tool would generate
> on the author's behalf.
It's a representative sample of what MS Word generates from a List style type.
In case that's too subtle - again, that's *representative* of a very widely used commercial tool's interpretation of how to generate HTML.
>> </span>TCP-AO uses a separate option Kind (TBD-IANA-KIND).</p>
>> <p class=RFCListBullet>o<span style='font:7.0pt "Times New
>> </span>TCP-AO allows TCP MD5 to continue to be used concurrently for legacy
>> There is no way to extract list container boundaries in post-processing.
> Really? It's ugly, but completely possible. You look for a group of
> p.RFCListBullet next to one another. Strip out the "o", switch p to li, and
> wrap a ul in front of it.
> I would appreciate it if you would be a little more careful about statements
> like "impossible" and "no way".
It is impossible to differentiate between lists that could be adjacent in the HTML but are semantically separate.
>>> In a text-only format, this is exactly
>>> the sort of ambiguity that the doc doesn't have enough structure to answer.
>> Agreed. SO WHAT?
>> Why are you insisting on retaining that structure? What is the *current*
>> ***NECESSARY*** purpose?
> Let's give yet another example. Consider RFC 3454
> (http://www.ietf.org/rfc/rfc3454.txt). Skip to the appendices. See the
> tables? There are several projects that parse the RFC text and generate
> source code to implement the stringprep protocol.
> Another example, draft-ietf-codec-opus
> (http://tools.ietf.org/id/draft-ietf-codec-opus-14.txt) contains a big
> tarball of source code, which is actually normative. The draft contains
> instructions for how to parse itself to extract the tarball.
>>> In the face of a lack of data, I'd say that it's one list with four items.
>>> If it doesn't matter to the author enough to use a tool that preserves his
>>> or her intent, then nobody else is likely to care about the difference
>> I claim that this is true for all container information. The only such info
>> that might be useful downstream is for editors,
> Those folks are information-extractors. The Editor's job is to get the
> draft in order to be published as an RFC.
>> and they can/should be able to
>> (re)generate valid container information themselves if needed, and leave that
>> requirement out of the submission stream.
> Why make this more difficult for them than is needed, if it's roughly the
> same amount of work to come up with an adequate format in the first place?
Because it's not the same amount of work.
> One of the signs of good architecture ...
This isn't an academic exercise in good architecture. It's the design of a production publication system.
If you want to create a new publication architecture, by all means please do. We should consider it for RFCs after it's been widely used for around 10-15 years.
>>> If Word is generating <li> without a <ul> or <ol> around it, there's a bug.
>> It has worked just fine for over 25 years with whatever internal structure it
> Yes. You've done lots of great work. Now it's time to get better. It's
> time to grow, time to learn, time to look to start to appear relevant to the
> world outside the IETF that looks at our document series as laughably
> archaic - and therefore of suspect technical content.
We are trying to get better, but not to be a research project.
>> Word generates lists using <p> commands just fine. Anything with numbers gets
>> those numbers when the HTML is generated, not rendered - which ensures it uses
>> the numbers the author referred to in the text.
> That's an interesting new requirement.
> Serious question: do you refer to list item numbers frequently? Would you
> mind pointing me to a document that does this, so I can reason about ways to
> approach this?
Same for Section numbers. You know, the ones that everyone wants to now use as a substitute for page numbers?
>> When Word inserts a BR it's because the author used a CTL-CR. That's because
>> the line break has meaning that the author wants to convey, which also means
>> that it should NOT be reflowed at the discretion of the viewer.
> Again serious, can you point me to a document where this was important? The
> only places I can think of at the moment all belong in pre elements.
Again, it's in the example I gave. It's used to generate breaks in the elements of lists to separate the list item from its description. It's not the only way to do it - HTML has about a dozen ways to do almost anything. And it's almost never true that only one way is "correct".
>>> You can intuit a container (and add it if need-be) if the sections are
>> For heading containers, sure - it's just a stack. But not for lists, code,
>> ASCII figures, or anything else necessarily.
> I'm pretty sure all of those are detectable by proximity, but regardless,
> those have nothing to do with containment, since they MUST be marked up
> appropriately in any adequately semantic input format.
Until you know you're sure, it's premature to assert that you know you can do it.
>>> No, I've given two. Programmatic editing is one, and information extraction
>>> is the other. The extraction function has nothing to do with editing, since
>>> it does not modify the file.
>> Editing involves copy/paste - it need not involve a single file.
> That seems a little pedantic. The information extractors are not editing
> the document for which we're talking about defining the structure. It feels
> like you're grouping disparate things under one heading in order to make
> points, which doesn't seem like it's moving the conversation forward
I've shown a number of ways in which containment adds requirements that aren't needed, and in some cases get in the way of copy/paste operations later.
No, I don't think "information extraction" is the primary purpose of going to the new format. If it is, call me when the publishing community converges on a format for that.
More information about the rfc-interest