[rfc-i] The <tt> train wreck
cabo at tzi.org
Mon Aug 16 02:09:03 PDT 2021
On 2021-08-16, at 03:11, Martin Thomson <mt at lowentropy.net> wrote:
> It seems like overloading this with three levels of semantics is the original sin.
Right. When the font change was split out from a single style attribute into bits (<em, <strong, <tt), the other semantics might have been split out as well. (There are some 64 or so combinations…)
> decorations (italic, bold, monospace), quoting (_, *, "), line breaking
You are using different terms than I did, and decoration is easily confused with decorator, so let me propose yet another set of terms:
font-change (italic, bold, monospace)
txt-fallback (_, *, “)
no-breaking syntax (break-on-space, no-break? See below.)
> From my perspective, it would be good to control each independently. With tags. I don't care if that is different tags, attributes on a single tag, or some combination of that with some global flags to control it. (Global flags => stylesheet?)
The spanx markup was meant to delimit a span of text for a style change. Having txt-fallback for that without the actual style change doesn’t make a lot of sense to me; this makes an attribute on the span to control txt-fallback look good.
No-breaking is actually useful without visual delimitation, so maybe this should be considered separately.
To me it does not make sense to remove the txt-fallback default=on from <tt but keep it in <em and <strong. I think we need to design a way to make this work for all three. (Note that the default fallbacks for <em and <strong don’t always work, so it would be good to be able to select a different one, as is also needed with <tt. Compare the use of >false< and >true< in https://www.rfc-editor.org/rfc/rfc8949.html#name-diagnostic-notation — we used that sick notation because the default fallback for <tt was wrong (»false« and »“false”« are two different things in CBOR diagnostic notation), but there was no way to specify a different fallback, so we opted to always have the fallback characters in there, and then we were limited to ASCII. Ouch.)
Global flags create all kinds of problems and are best avoided.
(CCing rfc-markdown again:) I shudder about the way to indicate the fallback preference in the markdown. Maybe this can be made almost palatable with predefined ALDs, as in
(Where nf is an abbreviation for “no fallback”.)
We could also invent some new syntax, of course (and we don’t need to limit the markdown input charset to ASCII for a more readable version of the above: »bar« maybe?).
Global flags are somewhat more excusable for a keyboarding syntax, but there still would need to be a way to compose text from different sources.
Note that the question which of the attribute values are default in the markdown is entirely orthogonal to the question which are the defaults in the XML; when in doubt, I prefer to keep backwards compatibility (which would mean the default should be fallback=_/*/" for <em, <strong, <tt).
> Regarding non-breaking options:
> I personally find the current reliance on &nbhy; (and worse, of which we have one in RFC 9000) to be problematic.
I find a zero-width space (U+200B) on 0x0100-0x01ff [look closely after the "-"!] in the table row “CRYPTO-ERROR”, is that what you mean? (zwsp U+200B ≠ zwnj U+200C, and I don’t think we need a lot of ligature control.)
My current view is that *introducing* break points into a span is something that the Unicode spaces do reasonably. An editor with a reveal-mode does help (haven’t primed my emacs to deal with U+200B yet, though).
> It means that the text you copy is not the text you expect which can confuse all but the smartest searcher. I would prefer to control this with tags. If nothing else, it would be much more explicit and less error-prone.
So you would prefer 0x0100-<preferentially-break-here/>0x01ff or some such?
> With all the effort that went into making BCP 14 not wrap,
(Do you mean the Phrase “BCP 14”, which should have an nbsp in it, or do you mean <bcp14>MUST NOT</bcp14>?)
> I note that RFC 9000 wraps between BCP and 14. Something that RFC 9087 doesn't do - at least for the text rendering (the is in the XML, but not the HTML, which suggests that xml2rfc is bleaching it incorrectly).
The boilerplate says “BCP 78” without no-break as well.
Note that RFC 9087 has six occurrences of “AS path”, only one of which is nbsp-protected (but the example pathes after three of them are).
Note that there are several aspects of horizontal no-breaking:
— turn blank space into no-break spaces etc.
— don’t allow breaking after characters such as / @ & | - + # % :
(— hyphenation no-breaking, which we don’t need as we don’t do hyphenation - or should we?)
Note that one constant source of spurious rfcdiff differences is the differences in breaking on »/«. The default could be to never break on these, but allow WJ (U+2060) to enable breaking. But then we are used to breaking on »-«…
More information about the rfc-interest