[rfc-i] Proposed new RFC submission requirements
Joe Touch
touch at isi.edu
Sat May 26 22:26:02 PDT 2012
On May 26, 2012, at 8:12 PM, Joe Hildebrand wrote:
> On 5/26/12 10:31 AM, "Joe Touch" <touch at isi.edu> wrote:
>
>> Checked the example I posted. It doesn't use nl or ol anywhere. Here's how it
>> outputs lists (please ignore the font cruft - that can be removed/cleaned
>> easily):
>
>> <h2><a name="_Toc257116045"></a>
>
> Meh. Could have done <h2 id="_Toc257116045">, which would have been
> cleaner.
>
>> <a name="_Ref252706999"></a><a name="_Ref170784094"></a><a
> name="_Toc170705132">
>
> What semantic are you trying to get across here? These are three ways you
> might want to jump to the same place? Are you sure this isn't cruft that's
> just left in by the WYSIWYG tool?
Was there some part of "check the example I posted" above that was ambiguous? Of course that's the cruft - I even said that above.
>> <p class=MsoNormal>This document replaces TCP MD5 as follows [RFC2385]:</p>
>
> What does MsoNormal mean? Why no quotes on the class name? Why no link on
> RFC2385?
No quotes because none are required. That's "Normal" in Microsoft Office (it prepends Mso). No link on the ref because that's not how it was generated.
Again, ignore the cruft.
>> <p class=RFCListBullet>o<span style='font:7.0pt "Times New
>
> Yow. That's just awful. I'd say that there's lots of bad code in the
> world. I've written bad code in lots of different languages. But manually
> injecting an "o" instead of using a perfectly valid construct like li? This
> is only HTML in the technical sense, not a representative sample of anything
> a competent author would write by hand, or an adequate tool would generate
> on the author's behalf.
It's a representative sample of what MS Word generates from a List style type.
In case that's too subtle - again, that's *representative* of a very widely used commercial tool's interpretation of how to generate HTML.
>> Roman"'>
>> </span>TCP-AO uses a separate option Kind (TBD-IANA-KIND).</p>
>>
>> <p class=RFCListBullet>o<span style='font:7.0pt "Times New
>> Roman"'>
>> </span>TCP-AO allows TCP MD5 to continue to be used concurrently for legacy
>> connections.</p>
>>
>> There is no way to extract list container boundaries in post-processing.
>
> Really? It's ugly, but completely possible. You look for a group of
> p.RFCListBullet next to one another. Strip out the "o", switch p to li, and
> wrap a ul in front of it.
>
> I would appreciate it if you would be a little more careful about statements
> like "impossible" and "no way".
It is impossible to differentiate between lists that could be adjacent in the HTML but are semantically separate.
>>> In a text-only format, this is exactly
>>> the sort of ambiguity that the doc doesn't have enough structure to answer.
>>
>> Agreed. SO WHAT?
>>
>> Why are you insisting on retaining that structure? What is the *current*
>> ***NECESSARY*** purpose?
>
> Let's give yet another example. Consider RFC 3454
> (http://www.ietf.org/rfc/rfc3454.txt). Skip to the appendices. See the
> tables? There are several projects that parse the RFC text and generate
> source code to implement the stringprep protocol.
>
> Another example, draft-ietf-codec-opus
> (http://tools.ietf.org/id/draft-ietf-codec-opus-14.txt) contains a big
> tarball of source code, which is actually normative. The draft contains
> instructions for how to parse itself to extract the tarball.
>
>>> In the face of a lack of data, I'd say that it's one list with four items.
>>> If it doesn't matter to the author enough to use a tool that preserves his
>>> or her intent, then nobody else is likely to care about the difference
>>> downstream.
>>
>> I claim that this is true for all container information. The only such info
>> that might be useful downstream is for editors,
>
> Those folks are information-extractors. The Editor's job is to get the
> draft in order to be published as an RFC.
>
>> and they can/should be able to
>> (re)generate valid container information themselves if needed, and leave that
>> requirement out of the submission stream.
>
> Why make this more difficult for them than is needed, if it's roughly the
> same amount of work to come up with an adequate format in the first place?
Because it's not the same amount of work.
> One of the signs of good architecture ...
This isn't an academic exercise in good architecture. It's the design of a production publication system.
If you want to create a new publication architecture, by all means please do. We should consider it for RFCs after it's been widely used for around 10-15 years.
>>> If Word is generating <li> without a <ul> or <ol> around it, there's a bug.
>>
>> It has worked just fine for over 25 years with whatever internal structure it
>> uses.
>
> Yes. You've done lots of great work. Now it's time to get better. It's
> time to grow, time to learn, time to look to start to appear relevant to the
> world outside the IETF that looks at our document series as laughably
> archaic - and therefore of suspect technical content.
We are trying to get better, but not to be a research project.
>> Word generates lists using <p> commands just fine. Anything with numbers gets
>> those numbers when the HTML is generated, not rendered - which ensures it uses
>> the numbers the author referred to in the text.
>
> That's an interesting new requirement.
>
> Serious question: do you refer to list item numbers frequently? Would you
> mind pointing me to a document that does this, so I can reason about ways to
> approach this?
Same for Section numbers. You know, the ones that everyone wants to now use as a substitute for page numbers?
>> When Word inserts a BR it's because the author used a CTL-CR. That's because
>> the line break has meaning that the author wants to convey, which also means
>> that it should NOT be reflowed at the discretion of the viewer.
>
> Again serious, can you point me to a document where this was important? The
> only places I can think of at the moment all belong in pre elements.
Again, it's in the example I gave. It's used to generate breaks in the elements of lists to separate the list item from its description. It's not the only way to do it - HTML has about a dozen ways to do almost anything. And it's almost never true that only one way is "correct".
>>> You can intuit a container (and add it if need-be) if the sections are
>>> separated.
>>
>> For heading containers, sure - it's just a stack. But not for lists, code,
>> ASCII figures, or anything else necessarily.
>
> I'm pretty sure all of those are detectable by proximity, but regardless,
> those have nothing to do with containment, since they MUST be marked up
> appropriately in any adequately semantic input format.
Until you know you're sure, it's premature to assert that you know you can do it.
>>> No, I've given two. Programmatic editing is one, and information extraction
>>> is the other. The extraction function has nothing to do with editing, since
>>> it does not modify the file.
>>
>> Editing involves copy/paste - it need not involve a single file.
>
> That seems a little pedantic. The information extractors are not editing
> the document for which we're talking about defining the structure. It feels
> like you're grouping disparate things under one heading in order to make
> points, which doesn't seem like it's moving the conversation forward
> rapidly.
I've shown a number of ways in which containment adds requirements that aren't needed, and in some cases get in the way of copy/paste operations later.
No, I don't think "information extraction" is the primary purpose of going to the new format. If it is, call me when the publishing community converges on a format for that.
Joe
More information about the rfc-interest
mailing list