[rfc-i] The <tt> train wreck

Martin Thomson mt at lowentropy.net
Sun Aug 15 18:11:07 PDT 2021

It seems like overloading this with three levels of semantics is the original sin.

decorations (italic, bold, monospace), quoting (_, *, "), line breaking

>From my perspective, it would be good to control each independently.  With tags.  I don't care if that is different tags, attributes on a single tag, or some combination of that with some global flags to control it.  (Global flags => stylesheet?)

Regarding non-breaking options:

I personally find the current reliance on   &nbhy; (and worse, ‌ of which we have one in RFC 9000) to be problematic.  It means that the text you copy is not the text you expect which can confuse all but the smartest searcher.  I would prefer to control this with tags.  If nothing else, it would be much more explicit and less error-prone.

With all the effort that went into making BCP 14 not wrap, I note that RFC 9000 wraps between BCP and 14.  Something that RFC 9087 doesn't do - at least for the text rendering (the   is in the XML, but not the HTML, which suggests that xml2rfc is bleaching it incorrectly).

On Sat, Aug 14, 2021, at 09:13, Carsten Bormann wrote:
> The original RFCXML RFC 2629 did not have any elements to indicate 
> emphasis (often rendered as italic/oblique and/or bold type).  More 
> normatively in practice, XML2RFCv1 had »<spanx style=«, which provided 
> text spans with the following properties:
> | style   | nobreak | decorator | type      | tt |
> |---------|---------|-----------|-----------|----|
> | emph    |         | _         | em        | -  |
> | strong  |         | *         | strong    | -  |
> | nobreak | x       | none      | -         | -  |
> | vbare   | x       | none      | -         | x  |
> | verb    | x       | "         | -         | x  |
> | vemph   | x       | _         | em        | x  |
> | vstrong | x       | *         | strong    | x  |
> | vdeluxe | x       | *_ and _* | em strong | x  |
> The “style" column is the style attribute that could be given to spanx.
> The "nobreak" column indicates whether a non-breaking semantics was 
> intended (trying harder to keep pieces together around /@&|-+#%: 
> characters).
> For TXT, the “decorator" column indicates what character is used 
> _around_ the span in plaintext rendering.  Note that only “nobreak" and 
> “vbare” had no decorator.  Using »"« as a decorator for “verb” 
> certainly was an unloved compromise that did, however, work well enough.
> For HTML, I note that the nobreak functionality was commented out in my 
> version of XML2RFCv1 (there seemed to be no easy way to translate it 
> into HTML at the time).  The columns “type” and “tt” indicate the font 
> to be used in HTML: “type” provides the variation of the base font, and 
> “tt” indicates whether the base font is monospaced or not.
> Much of this functionality was known only to people who actually looked 
> into the source code of XML2RFC.  I haven’t checked XML2RFCv2, but it 
> seems that some of these features should have survived into v2, but 
> have decayed.  Since almost nobody cared about the HTML renderings (it 
> was irrelevant for the actual RFC publishing), I’m not sure that full 
> support was checked extensively — XML2RFCv1 was available and could be 
> used by authors that did need the full functionality.
> Enter v3.
> I can’t find any horizontal no-breaking support (except for that which 
> Unicode provides).  Unexplicably whole-document options like 
> --table-hyphen-breaks were introduced to create tweaks that should have 
> been applied for specific items.
> Emph and strong were finally spread out into their own elements, <em 
> and <strong.
> These can be combined with each other and with monospaced font 
> selection, so the latter was turned into its own element, <tt.
> As <spanx style=“verb”> was the only supported form of monospacing in 
> xml2rfcv2 at the time, <tt putatively was its replacement.
> So we have the last two columns of my table above covered, but not the 
> second and the third.
> Little thinking was wasted about the plaintext rendering of these new 
> span elements — after all, XML2RFCv2 had emph, strong, and verb, and 
> these seemed to work.
> In fact, the de-facto v3 manual 
> https://xml2rfc.tools.ietf.org/xml2rfc-doc.html until today doesn’t 
> mention plaintext rendering of these elements.
> So, since RFC8650, v3 documents have been keyboarded and proof-read 
> under the assumption that <em, <strong, and <tt are the replacements of 
> the spanx styles “emph”, “strong”, and “verb”.
> Note that these span elements not only set the font for each of the 
> characters in the span; they also have a delimiting semantics.  This is 
> obvious when decorators are used in the plaintext form, but also on the 
> HTML side, the <tt element is rendered as a separate HTML element, 
> which with its CSS styles makes sure <tt>a </tt> (note the space after 
> the a) looks different from <tt>a</tt>.
> The »"« decorator for what is now <tt continues to be unloved.
> Since 2020-06-20, there is some weird code in XML2RFCv3 that guesses 
> that these decorators are unwanted in certain table contexts; the fact 
> that this code tends to guess wrong (of course!) already has led to a 
> bug report [0].
> Apparently this will be fixed by removing the decorators entirely [1].
> So, after 400+ RFCs have been published under the assumption of (and 
> proofread against) decorators enabling understanding the plaintext 
> rendering of <tt, the meaning of <tt will be retroactively changed from 
> <spanx style=“verb”> to <spanx style=“vbare”> (which hasn’t even been 
> available for the decade most people used XML2RFCv2).
> I have pointed out why the decorators are needed in certain cases [2].
> As several people point out, there are also cases where the <tt 
> decorators are unneeded or even somewhat ugly.
> Which of these are the case depends on the authors’ intent with the <tt.
> As that is not captured (as it used to be in verb vs. vbare), there is 
> no hope to get this decision right in the two formatters, TXT and 
> TL;DR:
> The decision to always remove the decorations from the TXT rendering of 
> <tt> is wrong.
> This is because unfortunately <tt> is broken, i.e., ill-conceived, for 
> its application.
> As this has been enshrined, there can be no backward-compatible “right 
> thing", only repairs going forward.
> (The decision also points out that the way we currently reach these 
> decisions is broken; maybe we can revisit that particular point when we 
> come up with a new decision structure.)
> Grüße, Carsten
> PS.: I have CCed this to rfc-markdown because there also is no good way 
> in markdown to keyboard the distinction between a decorative <tt that 
> can be ignored in plaintext and one that *needs* a representation 
> (“decorators”):  Like this part of XML2RFCv3, markdown also was not 
> designed to format into plaintext.  It would be nice if there were a 
> way to create a workaround from kramdown-rfc, but the problem is that 
> the single piece of XML that kramdown-rfc puts out needs to generate 
> both the TXT and the HTML/PDF version, and there is no way in RFCXMLv3 
> to indicate the variant processing needed for TXT.
> [0]: 
> <https://mailarchive.ietf.org/arch/msg/xml2rfc/30tTnMMcJHCIH8t8-s_NVLZYrFg>
> [1]: https://trac.ietf.org/trac/xml2rfc/ticket/600
> [2]: E—mail that unfortunately only went to a subgroup of people 
> discussing the issue and therefore isn’t archived; reproduced below.  
> It first discussed the need to resurrect some non-breaking semantics, 
> and then (search for “txtquotes”) discussed the need for the author to 
> actively choose between decorated (»txtquotes=“true”«) and undecorated 
> (»txtquotes=“true”«) variants.
> > On 7/8/21 1:33 AM, Carsten Bormann wrote:
> >> I probably should add that on the authoring side, a variant of <code> with non-breaking semantics is needed.
> >> But that can be done in the authoring tool (kramdown-rfc) by transliterating space, hyphen etc. into their non-breaking equivalents, so it probably doesn’t need support from xml2rfc.
> […]
> I ran into this requirements (non-breaking semantics) when I converted 
> the XML for RFC 6125, where the RFC-editor (I assume) had converted 
> some, but not all hyphens in syntax snippets that were in quotes into 
> non-breaking hyphens.
> (Note that they didn’t bother with the at the time clumsy <spanx 
> style=“verb”> but put in the quotes that would have been generated 
> anyway from that directly, as that is equivalent with TXT-only 
> production.)
>  (or its equivalent) containing a "reg&nbhy;name".  (Matching only the
>  "reg&nbhy;name" rule from <xref target='URI'/> limits verification to DNS
>  domain names, thereby differentiating a URI&nbhy;ID from a
>  A certificate for this service might include SRV-IDs of 
>  "_xmpp&nbhy;client.im.example.org" and "_xmpp&nbhy;server.im.example.org” 
>  (see <xref target='XMPP'/>), a DNS-ID of "im.example.org", and an XMPP-specific
> Clearly, those quotes were desired in the TXT output here (but, I don’t 
> think, would not have been in the HTML), so these all would be 
> specified as <tt txtquotes=“true”> in my imagined new world.
> On the matter of my suggested txtquotes bit, I also just looked into 
> RFC 8949, because I’m familiar with it and it is a non-trivial document.
> The instances of <tt>false… (true, …) would actually improve with 
> txtquotes=“false".
> The instances of <tt>0 and <tt>0.0 also work with txtquotes=“false".
> The reference to
>  the <tt>date-time</tt> production in <xref target="RFC3339" 
> can be understood either way, but for txtquotes=“false” that 
> understandability hinges on the production having a nominal name; if it 
> were <tt txtquotes=“false" >second</tt>, this would not work at all.
> Similar with
>  doesn't match the <tt>URI-reference</tt> production, the string is 
> invalid.</li>
> Neutral for
>  the encoded text string <tt>0x62c0ae</tt>
> Getting more of a problem:
>  interested in this information.  For example, <tt>_</tt> or <tt>_3</tt>.
> Completely broken:
>      <t indent="0" pn="section-appendix.c-5">Note that <tt>well_formed</tt>
>      returns the major type for well-formed
> Note that one is the function name and the other one is the property 
> defined in this RFC.
> (Yes, in this case the reader can guess which is which by one being 
> snake_case and the other kebab-case.  But ouch.)
> So to make the TXT acceptable, the XML and thus the HTML would need to 
> be changed here.
> In the RFCXMLv2 times, the authors could decide whether they wanted TXT 
> quotes and just leave the decoration off if they didn’t.  But with 
> better styling available on the HTML side, they’ll want to switch on 
> <tt> and sometimes regret if that semantics is suppressed on the TXT 
> side.
> By the way, that same “switch off the fallback” bit would tremendously 
> improve <em> and <strong> as well in certain cases, too:
>   *  *SHA-384* and *SHA-512* hash functions are efficient for 64-bit
>      hardware.
> (I’ve seen much worse, but can’t find an example of that right now.  I 
> did see one in the course of today!)
> So maybe this would be ignore-in-plain-text=“true” instead of txtquotes=“false”.
> Grüße, Carsten
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest

More information about the rfc-interest mailing list